ACL RD-TEC 1.0 Summarization of I05-3019
Paper Title:
UNIGRAM LANGUAGE MODEL FOR CHINESE WORD SEGMENTATION
UNIGRAM LANGUAGE MODEL FOR CHINESE WORD SEGMENTATION
Authors: Aitao Chen and Yiping Zhou and Anne Zhang and Gordon Sun
Primarily assigned technology terms:
- algorithm
- automaton
- chinese word segmentation
- combined training
- conditional maximum entropy
- consistency checking
- finite state
- finite state automaton
- maximum entropy
- maximum entropy model
- name recognizer
- phrase segmentation
- post-processing
- recognizer
- segmentation
- segmentation algorithm
- segmentation system
- segmenter
- state automaton
- tuning
- word segmentation
- word segmentation bakeoff
- word segmentation system
Other assigned terms:
- baseline performance
- characters
- chinese word
- corpora
- data set
- dictionary
- entropy
- evaluations
- f-score
- feature
- frequency counts
- implementation
- input text
- language model
- names
- occurrence frequency
- organization names
- part-of-speech
- person names
- phrase
- precision
- probability
- procedure
- proper name
- proper names
- regular expressions
- segmentation bakeoff
- segmentation dictionary
- sentence
- sentences
- sinica corpus
- system description
- testing data
- text
- training
- training corpus
- training data
- training data set
- training set
- unigram
- unigram language model
- unigram model
- user
- word
- word frequency
- words