ACL RD-TEC 1.0 Summarization of W06-0118
Paper Title:
VOTING BETWEEN DICTIONARY-BASED AND SUBWORD TAGGING MODELS FOR CHINESE WORD SEGMENTATION
VOTING BETWEEN DICTIONARY-BASED AND SUBWORD TAGGING MODELS FOR CHINESE WORD SEGMENTATION
Authors: Dong Song and Anoop Sarkar
Primarily assigned technology terms:
- algorithm
- chinese language processing
- chinese word segmentation
- computational linguistics
- conditional random field
- conditional random fields
- dictionary-based method
- error analysis
- greedy segmentation
- hanzi-level majority voting
- language processing
- learning
- majority voting
- matching
- matching algorithm
- maximum matching
- modeling
- post-processing
- processing
- recognition
- segmentation
- segmentation process
- segmentation system
- sequence learning
- sequence modeling
- shortest matching
- shortest path
- simple majority voting
- statistical methods
- statistical sequence modeling
- subword-based tagging
- tagger
- tagging
- text analysis
- voting
- word recognition
- word segmentation
- word segmentation bakeoff
- word segmentation system
- word segmentation task
Other assigned terms:
- ambiguity
- approach
- association for computational linguistics
- bigram
- characters
- chinese language
- chinese word
- corpora
- crf model
- data set
- dictionary
- dictionary entries
- experimental results
- external knowledge
- f-measure
- f-score
- feature
- feature sets
- gold test set
- input text
- knowledge
- lattice
- lattices
- lexicon
- linguistics
- mapping
- meaning
- method
- names
- organization names
- out-of-vocabulary rate
- part-of-speech
- part-of-speech information
- person names
- precision
- procedure
- process
- segmentation bakeoff
- segmentation lattice
- sentence
- statistical sequence
- system description
- system performance
- tags
- test data
- test set
- text
- training
- training corpora
- training corpus
- training data
- training data set
- training material
- training set
- understanding
- unigram
- upuc corpora
- word
- words