ACL RD-TEC 1.0 Summarization of P98-1108
Paper Title:
USE OF MUTUAL INFORMATION BASED CHARACTER CLUSTERS IN DICTIONARY-LESS MORPHOLOGICAL ANALYSIS OF JAPANESE
USE OF MUTUAL INFORMATION BASED CHARACTER CLUSTERS IN DICTIONARY-LESS MORPHOLOGICAL ANALYSIS OF JAPANESE
Authors: Hideki Kashioka and Yasuhiro Kawata and Yumiko Kinjo and Andrew Finch and Ezra W. Black
Primarily assigned technology terms:
- algorithm
- analyzer
- character clustering
- classification
- classifier
- classifiers
- clustering
- decision-tree
- decoder
- decoder algorithm
- factoring
- language modeling
- listing
- machine translation
- modeling
- morphological analysis
- morphological analyzer
- morphology
- part-of-speech tagger
- part-of-speech tagging
- processing
- search
- smoothing
- stack decoder
- statistical language modeling
- tagger
- tagging
- terminology
- tokenizer
- word clustering
Other assigned terms:
- adjective
- adverb
- alphabet
- approach
- auxiliary verb
- branching trees
- case
- character sequence
- characters
- chinese characters
- cluster
- clusters
- conversation
- conversation corpus
- corpora
- correlation
- decision-tree model
- device
- dictionary
- distribution
- edr corpus
- entropy
- evaluations
- events
- experimental results
- fact
- feature
- heuristic
- heuristic rules
- heuristics
- japanese language
- japanese text
- joint probability
- kanji
- katakana
- knowledge
- language model
- leaf
- method
- mutual information
- n-gram
- n-gram language model
- n-gram models
- orthography
- part-of-speech
- part-of-speech tag
- part-of-speech tags
- particles
- parts of speech
- parts-of-speech
- parts-ofspeech
- probability
- probability distribution
- probability estimates
- procedure
- process
- pronouns
- semantic
- sentence
- sentences
- substring
- syllables
- symbols
- tag sequence
- tag set
- tagged corpora
- tagging model
- tags
- target word
- technical terminology
- terms
- test set
- text
- training
- training data
- training set
- tree
- trees
- verb
- word
- word boundaries
- word classes
- word model
- word sequence
- word string
- words