ACL RD-TEC 1.0 Summarization of A00-1024
Paper Title:
CATEGORIZING UNKNOWN WORDS: USING DECISION TREES TO IDENTIFY NAMES AND MISSPELLINGS
CATEGORIZING UNKNOWN WORDS: USING DECISION TREES TO IDENTIFY NAMES AND MISSPELLINGS
Primarily assigned technology terms:
- algorithm
- capitalization
- classifier
- data mining
- decision making
- decision tree
- decision tree algorithm
- decision tree approach
- decision tree classifier
- decision tree training
- decision trees
- decision-making
- decision-tree
- error detection
- error detection and correction
- hyphenation
- identification
- internet
- language processing
- mining
- name identification
- name recognition
- name recognizer
- natural language processing
- nlp
- nlp system
- part-of-speech tagger
- pos tagging
- processing
- recognition
- recognizer
- spell checker
- spelling
- spelling error detection
- statistical tagger
- tagger
- taggers
- tagging
- text processing
- training algorithm
- tree algorithm
- tree classifier
- voting
- weighted voting
Other assigned terms:
- abbreviation
- abbreviations
- approach
- binary feature
- call center
- capitalization information
- case
- case information
- character sequence
- characters
- checker
- concept
- confidence measure
- confusion matrix
- corpora
- corpus frequency
- data set
- data sets
- determiners
- dictionary
- edit distance
- f-score
- feature
- foreign words
- genre
- heuristic
- information sources
- knowledge
- language resources
- leaf
- lexicon
- linguistic
- linguistic resources
- measure
- measures
- modular architecture
- morphological variant
- multicomponent architecture
- names
- natural language
- noise
- nouns
- orthographic similarity
- part of speech
- part-of-speech
- parts of speech
- portability
- pos information
- precision
- predictive information
- procedure
- process
- pronouns
- proper name
- proper names
- punctuation
- recognition module
- sentences
- spelling error
- system architecture
- tag set
- tags
- tagset
- technique
- term
- terms
- test corpus
- test data
- test data set
- text
- training
- training and test data
- training corpus
- training data
- transcript
- transcripts
- tree
- trees
- word
- word corpus
- word level
- words
- world knowledge
- writing system