ACL RD-TEC 1.0 Summarization of P03-1051
Paper Title:
LANGUAGE MODEL BASED ARABIC WORD SEGMENTATION
LANGUAGE MODEL BASED ARABIC WORD SEGMENTATION
Authors: Young-Suk Lee and Kishore Papineni and Salim Roukos and Ossama Emam and Hany Hassan
Primarily assigned technology terms:
- algorithm
- analysis technique
- analyzer
- arabic word segmentation
- bootstrap
- computing
- decoder
- decoding
- disambiguation
- error rate reduction
- expectationmaximization
- expectationmaximization algorithm
- identification
- information retrieval
- language model building
- language model training
- language processing
- machine translation
- matching
- model building
- model training
- model training and decoding
- morpheme segmentation
- morphological analysis
- morphological analyzer
- morphology
- natural language processing
- natural language systems
- new stem acquisition
- parameter estimation
- part-of-speech tagging
- probability estimation
- processing
- rate reduction
- search
- search algorithm
- segmentation
- segmentation algorithm
- segmentation system
- segmenter
- statistical machine translation
- stem acquisition
- stem acquisition technique
- suffix identification
- tagging
- tokenizer
- unsupervised acquisition
- unsupervised algorithm
- unsupervised stem acquisition
- unsupervised technique
- word segmentation
- word segmentation system
- word segmenter
- word-to-word alignment
Other assigned terms:
- acquisition technique
- adjective
- adverb
- affixes
- ambiguity
- ambiguity problem
- arabic treebank
- bigram
- case
- contextual information
- corpora
- corpus size
- correlation
- derivation
- dictionary
- dutch
- error rate
- estimation
- evaluations
- exact match
- experimental results
- f-score
- foreign words
- implementation
- inflected forms
- inflectional morphology
- input text
- interpolation
- knowledge
- language model
- language model score
- language processing applications
- lemma
- likelihood
- linguistic
- linguistic resources
- meaning
- meanings
- method
- minimum description length
- model parameters
- model probability
- morpheme
- morphemes
- natural language
- natural language processing applications
- orthography
- part of speech
- part-of-speech
- part-of-speech information
- parts-of-speech
- prefixes and suffixes
- prepositions
- probabilities
- probability
- process
- pronouns
- proper noun
- punctuation
- russian
- seed
- segmentation accuracy
- segmentation ambiguity
- segmented corpus
- segments
- semantic
- semitic languages
- sentence
- stem
- stems
- suffix
- suffixes
- target languages
- technique
- test corpus
- test set
- text
- text corpora
- tokens
- toolkit
- training
- training corpus
- translations
- treebank
- trigram
- trigram language model
- trigram model
- unigram
- verb
- vocabulary
- vocabulary size
- word
- word corpus
- word error rate
- words