ACL RD-TEC 1.0 Summarization of W02-1013
Paper Title:
FROM WORDS TO CORPORA: RECOGNIZING TRANSLATION
FROM WORDS TO CORPORA: RECOGNIZING TRANSLATION
Primarily assigned technology terms:
- algorithm
- approximation
- bipartite matching
- bootstrapping
- classi cation
- classi er
- computing
- cross-validation
- decoding
- f matching
- greedy approximation
- information retrieval
- information retrieval systems
- language processing
- linear regression
- linking
- machine translation
- matching
- matching algorithm
- mining
- modeling
- multilingual information retrieval
- natural language processing
- normalization
- parameter estimation
- processing
- recognition
- regression
- retrieval systems
- sampling
- scoring
- search
- similarity scoring
- smoothing
- statistical modeling
- text search
- thresholding
- tokenization
- translation detection
- viterbi
- word-to-word translation
- world-wide web
Other assigned terms:
- agreement score
- approach
- bilingual dictionary
- case
- chinese corpus
- chunks
- comparable corpora
- comparable corpus
- corpora
- development set
- dictionaries
- dictionary
- disjunction
- distribution
- document
- english-chinese parallel corpus
- estimation
- evaluation task
- evaluations
- experimental results
- f-score
- french
- function words
- generation
- gold standard
- hypotheses
- implementation
- information theory
- language pair
- large corpora
- lexicon
- linear regression model
- linguistic
- linguistic resources
- log-likelihood
- log-likelihood ratio
- markup
- measure
- measures
- method
- multilingual corpus
- multilingual information
- mutual information
- names
- natural language
- noise
- parallel corpora
- parallel corpus
- parallel text
- parallel text corpora
- parallel texts
- performance comparison
- posterior
- precision
- probabilistic model
- probabilities
- probability
- probability distribution
- process
- punctuation
- regression model
- relation
- scalability
- scrambling
- search space
- segments
- sentence
- sentence level
- similarity score
- similarity scores
- statistical approach
- structural information
- technique
- terms
- test corpus
- test set
- text
- text corpora
- text length
- theory
- tokens
- training
- training corpus
- translation lexicon
- translation model
- translation models
- translation pair
- translation pairs
- translational equivalence
- translations
- unigram
- unigram model
- vertex
- web pages
- word
- word frequency
- word types
- word-level model
- word-to-word translation model
- words