ACL RD-TEC 1.0 Summarization of P06-1062
Paper Title:
A DOM TREE ALIGNMENT MODEL FOR MINING PARALLEL DATA FROM THE WEB
A DOM TREE ALIGNMENT MODEL FOR MINING PARALLEL DATA FROM THE WEB
Authors: Lei Shi and Cheng Niu and Ming Zhou and Jianfeng Gao
Primarily assigned technology terms:
- algorithm
- application programming interface
- classifier
- computational linguistics
- crawling
- cross-validation
- crossvalidation
- data acquisition
- data mining
- decoding
- dynamic programming
- identification
- iterative scaling
- likelihood estimation
- machine translation
- matching
- maximum entropy
- maximum entropy model
- maximum likelihood
- maximum likelihood estimation
- mining
- node deletion
- pair identification
- parallelism verification
- parameter estimation
- pattern matching
- performance enhancement
- processing
- programming interface
- searching
- sentence alignment
- signal processing
- synchronous tree substitution
- tag mapping
- tree alignment
- tree substitution
- web mining
Other assigned terms:
- adjunction
- alignment accuracy
- alignment model
- alignment task
- anchor
- annotators
- approach
- association for computational linguistics
- case
- chinese word
- chunks
- context free grammar
- corpora
- data consortium
- distribution
- document
- document object model
- edit distance
- english-chinese parallel corpus
- entropy
- estimation
- experimental results
- feature
- forest
- formalism
- french
- grammar
- hierarchical structure
- html document
- human annotators
- ibm model
- implementation
- knowledge
- leaf
- lexical information
- lexicon
- likelihood
- linguistic
- linguistic data
- linguistic data consortium
- linguistics
- logical structure
- mapping
- measure
- method
- parallel corpora
- parallel corpus
- parallel sentence
- parallel text
- parallelism
- precision
- probabilities
- probability
- procedure
- process
- sentence
- sentence pair
- sentences
- signal
- similarity score
- source sentence
- sub-tree
- substitution grammar
- syntactic tree
- syntax
- system performance
- tags
- technique
- text
- text translation probability
- time complexity
- training
- training corpora
- translation probabilities
- translation probability
- translational equivalence
- tree
- tree adjoining grammar
- tree alignment model
- tree model
- tree substitution grammar
- trees
- web documents
- web mining scheme
- web page
- web pages
- web site
- word
- words