ACL RD-TEC 1.0 Summarization of W03-0408
Paper Title:
UPDATING AN NLP SYSTEM TO FIT NEW DOMAINS: AN EMPIRICAL STUDY ON THE SENTENCE SEGMENTATION PROBLEM
UPDATING AN NLP SYSTEM TO FIT NEW DOMAINS: AN EMPIRICAL STUDY ON THE SENTENCE SEGMENTATION PROBLEM
Authors: Tong Zhang and Fred Damerau and David Johnson
Primarily assigned technology terms:
- active learning
- adaptive language modeling
- algorithm
- approximation
- binary classification
- binary classifier
- boundary detection
- categorization
- chunking
- classification
- classifier
- classifiers
- corresponding training
- data selection
- density estimation
- encoding
- feature augmentation
- feature construction
- feature selection
- java
- language modeling
- language parsing
- language processing
- learning
- learning algorithms
- linear discriminant
- machine learning
- machine learning algorithms
- measuring
- modeling
- natural language parsing
- natural language processing
- nlp
- nlp system
- nlp systems
- optimization
- parsing
- processing
- random sampling
- sample selection
- sampling
- segmentation
- segmentation system
- semi-supervised learning
- sentence boundary detection
- sentence segmentation
- speech processing
- statistical nlp
- statistical system
- supervised adaption
- supervised learning
- text categorization
- text chunking
- unsupervised learning
- winnow method
Other assigned terms:
- abbreviations
- active learning paradigm
- annotated training text
- annotation
- approach
- binary classification problem
- binary features
- brown corpus
- case
- characters
- classification accuracy
- classification error
- classification performance
- classification problem
- classification scheme
- conditional probability
- confidence measure
- data set
- data sets
- decision rule
- derivation
- distance metric
- encoding scheme
- error rate
- estimation
- evaluation data
- experimental results
- fact
- feature
- feature value
- feature vector
- implementation
- labeling
- learning paradigm
- linguistic
- linguistic expressions
- linguistic features
- measure
- medline
- method
- modeling problem
- natural language
- nlp applications
- nlp tasks
- optimization problem
- probability
- segmentation problem
- sentence
- sentence boundaries
- sentence boundary
- set size
- standard deviation
- statistical models
- statistical natural language
- statistics
- style
- symbols
- system performance
- test data
- test set
- text
- text chunking problem
- tokens
- training
- training data
- training examples
- training set
- training text
- transformation
- tree-bank
- understanding
- word