ACL RD-TEC 1.0 Summarization of W04-1119
Paper Title:
A SEMI-SUPERVISED APPROACH TO BUILD ANNOTATED CORPUS FOR CHINESE NAMED ENTITY RECOGNITION
A SEMI-SUPERVISED APPROACH TO BUILD ANNOTATED CORPUS FOR CHINESE NAMED ENTITY RECOGNITION
Authors: Xiaoshan Fang and Jianfeng Gao and Huanye Sheng
Primarily assigned technology terms:
- algorithm
- bootstrapping
- bootstrapping approach
- bootstrapping process
- chinese named entity recognition
- chinese sentence generation
- chinese word segmentation
- context model estimation
- corpus annotation
- data annotation
- entity recognition
- entity recognition systems
- hidden markov
- hidden markov models
- likelihood estimation
- linear interpolation
- markov model
- maximum likelihood
- maximum likelihood estimation
- model estimation
- name recognizer
- named entity recognition
- parser
- recognition
- recognition systems
- recognizer
- search
- segmentation
- segmentation system
- segmenter
- semi-supervised training
- sentence generation
- smoothing
- statistical approaches
- training method
- unsupervised training
- word segmentation
- word segmentation system
- word segmenter
Other assigned terms:
- abbreviation
- annotated corpus
- annotated training corpora
- annotation
- approach
- backoff
- characters
- chinese characters
- chinese corpus
- chinese language
- chinese sentence
- chinese text
- chinese word
- chinese words
- context model
- context models
- corpora
- data sets
- data sparseness
- dictionary
- estimation
- evaluations
- f-measure
- fact
- generation
- generative probability
- interpolation
- knowledge
- lexicon
- likelihood
- linguistic
- linguistic knowledge
- location name
- markov models
- method
- named entities
- named entity
- names
- parametric model
- part-of-speech
- part-of-speech tags
- person names
- precision
- probabilities
- probability
- process
- schema
- seed
- semi-supervised approach
- sentence
- sparse data
- sparse data problem
- syntactic structure
- tags
- terms
- test set
- text
- text corpus
- training
- training corpora
- training corpus
- training data
- trigram
- trigram model
- word
- word boundaries
- word sequence
- word type
- word types
- words