ACL RD-TEC 1.0 Summarization of W04-3230
Paper Title:
APPLYING CONDITIONAL RANDOM FIELDS TO JAPANESE MORPHOLOGICAL ANALYSIS
APPLYING CONDITIONAL RANDOM FIELDS TO JAPANESE MORPHOLOGICAL ANALYSIS
Authors: Taku Kudo and Kaoru Yamamoto and Yuji Matsumoto
Primarily assigned technology terms:
- algorithm
- analyzer
- c + +
- chinese word segmentation
- classifiers
- conditional random fields
- constrained optimization
- crfs
- cross validation
- cross-validation
- decoding
- encoding
- entity recognition
- feature selection
- forward-backward algorithm
- grouping
- hidden markov
- hidden markov models
- hmms
- information extraction
- inner product
- iterative scaling
- japanese morphological analysis
- learning
- likelihood estimation
- machine learning
- matching
- maximum entropy
- maximum likelihood
- maximum likelihood estimation
- morphological analysis
- morphological analyzers
- named entity recognition
- normalization
- optimization
- parameter estimation
- parsing
- part-of-speech tagging
- processing
- recognition
- regularization
- rule-based analyzer
- segmentation
- shallow parsing
- smoothing
- tagging
- unknown word processing
- validation
- viterbi
- viterbi algorithm
- word guessing
- word processing
- word segmentation
Other assigned terms:
- abbreviation
- ambiguity
- annotated corpora
- approach
- auxiliary verb
- bias
- boundary ambiguity
- characters
- chinese characters
- chinese word
- community
- conditional probability
- conjugation form
- corpora
- data set
- data sets
- data sparseness
- data sparseness problem
- disk
- distribution
- english part-of-speech
- entropy
- entropy models
- estimation
- experimental results
- exponential model
- feature
- feature vector
- gaussian prior
- hierarchical structure
- implementation
- independence assumption
- index
- joint probability
- katakana
- knowledge
- kyoto university corpus
- labeling
- lattice
- lexicon
- likelihood
- log-likelihood
- markov models
- maximum entropy models
- method
- n-gram
- named entity
- names
- optimization problem
- part-of-speech
- part-ofspeech
- particle
- penn treebank
- probabilities
- probability
- process
- proper noun
- rwcp text corpus
- sentence
- sentences
- sparseness problem
- statistics
- suffixes
- svms
- tags
- tagset
- text
- text corpus
- tokens
- training
- training data
- training set
- transition probabilities
- transition probability
- tree
- treebank
- trigram
- verb
- word
- word boundaries
- word boundary
- word boundary ambiguity
- word level
- words