ACL RD-TEC 1.0 Summarization of W02-1211
Paper Title:
CONSTRUCTING OF A LARGE-SCALE CHINESE-ENGLISH PARALLEL CORPUS
CONSTRUCTING OF A LARGE-SCALE CHINESE-ENGLISH PARALLEL CORPUS
Authors: Le Sun and Song Xue and Weimin Qu and Xiaofeng Wang and ufang Sun
Primarily assigned technology terms:
- algorithm
- alignment algorithm
- automatic sentence alignment
- bilingual lexicon extraction
- chinese information processing
- chinese segmentation
- chinese word segmentation
- coding
- computational linguistics
- concordance tool
- corpus linguistics
- data exchange
- dynamic programming
- electronic text encoding
- encoding
- example-based machine translation
- human checking
- information processing
- internet
- language engineering
- length-based alignment
- lexicon extraction
- listing
- machine translation
- maximum likelihood
- model training
- paragraph alignment
- phase extraction
- processing
- segmentation
- sentence alignment
- text encoding
- translation model training
- word segmentation
Other assigned terms:
- alignment model
- american english
- annotation
- approach
- bibliographical information
- bilingual corpora
- bilingual lexicon
- british english
- case
- characters
- chinese characters
- chinese corpora
- chinese corpus
- chinese part-of-speech
- chinese sentence
- chinese word
- chinese words
- concordance
- corpora
- corpus encoding standard
- document
- document type definition
- english sentence
- genre
- language engineering research
- language pair
- language resource
- language resources
- lexical items
- lexicon
- likelihood
- linguistic
- linguistics
- linguistics research
- literal translation
- method
- multilingual corpus
- noise
- paragraph
- parallel corpus
- parallel texts
- part-of-speech
- part-of-speech annotation
- pos tag
- precision
- process
- processing platform
- sentence
- sentence level
- sentence pair
- sentences
- source text
- statistics
- structural information
- syntax
- tags
- teaching
- term
- terms
- text
- text encoding initiative
- training
- translation model
- tree
- tree bank
- web pages
- word
- words