ACL RD-TEC 1.0 Summarization of C02-1148
Paper Title:
INVESTIGATING THE RELATIONSHIP BETWEEN WORD SEGMENTATION PERFORMANCE AND RETRIEVAL PERFORMANCE IN CHINESE IR
INVESTIGATING THE RELATIONSHIP BETWEEN WORD SEGMENTATION PERFORMANCE AND RETRIEVAL PERFORMANCE IN CHINESE IR
Authors: Fuchun Peng and Xiangji Huang and Dale Schuurmans and Nick Cercone
Primarily assigned technology terms:
- algorithm
- automated processing
- automatic segmentation
- chinese information processing
- chinese information retrieval
- chinese retrieval
- chinese text indexing
- chinese text retrieval
- chinese word segmentation
- coding
- database
- dictionary-based word segmentation
- em algorithm
- english text retrieval
- forward match
- hidden markov
- hidden markov model
- identification
- indexing
- information extraction
- information processing
- information retrieval
- keyword extraction
- keyword weighting
- learning
- learning technique
- learning techniques
- learning-based segmentation
- lexicon pruning
- longest matching
- machine learning
- machine learning techniques
- machine translation
- markov model
- matching
- maximum matching
- measuring
- nlp
- ppm word segmentation
- processing
- pruning
- pruning strategy
- recognition
- retrieval engine
- retrieval method
- retrieval technique
- segmentation
- segmentation algorithm
- segmentation method
- segmentation process
- segmenter
- self-supervised learning
- semantic analysis
- semantic text processing
- supervised method
- supervised training
- term weighting
- text compression
- text indexing
- text processing
- text retrieval
- text segmentation
- tokenziation
- tuning
- unsupervised method
- unsupervised segmentation
- unsupervised technique
- viterbi
- viterbi algorithm
- weighting
- word identification
- word recognition
- word segmentation
- word segmenter
Other assigned terms:
- ambiguity
- ambiguous words
- approach
- break
- case
- characters
- chinese characters
- chinese corpus
- chinese text
- chinese word
- chinese words
- coding scheme
- compound words
- compounds
- corpora
- correlation
- data set
- data sets
- dictionaries
- dictionary
- discourse
- document
- document collection
- document length
- english text
- evaluations
- experimental results
- f-measure
- fact
- heuristic
- index
- keyword
- language model
- lexicon
- linguistic
- mandarin chinese
- meaning
- measure
- measures
- method
- minimum description length
- n-gram
- n-gram language model
- n-gram model
- nist
- nlp applications
- parameter settings
- ph corpus
- precision
- procedure
- process
- processing tasks
- queries
- query
- recognition accuracy
- retrieval performance
- segmentation accuracy
- segmentation problem
- segmented corpus
- segments
- semantic
- semantic meaning
- sentence
- sentences
- source text
- technique
- technology
- term
- term frequency
- term weighting scheme
- terms
- test corpus
- text
- text collection
- tokens
- topics
- training
- training corpora
- training data
- trec data
- trec evaluation
- weighting formula
- weighting scheme
- word
- word segmentation accuracy
- word segmentation performance
- words