ACL RD-TEC 1.0 Summarization of W99-0908
Paper Title:
TEXT CLASSIFICATION BY BOOTSTRAPPING WITH KEYWORDS, EM AND SHRINKAGE
TEXT CLASSIFICATION BY BOOTSTRAPPING WITH KEYWORDS, EM AND SHRINKAGE
Authors: Andrew McCallum and Kamal Nigam
Primarily assigned technology terms:
- algorithm
- artificial intelligence
- bayes classifier
- bayes text classification
- bootstrap
- bootstrapping
- bootstrapping algorithm
- bootstrapping approach
- bootstrapping classification
- bootstrapping process
- classification
- classification algorithm
- classification techniques
- classifier
- classifiers
- clustering
- co-training
- co-training algorithm
- computer science
- data generation
- detection and tracking
- disambiguation
- discriminative training
- em algorithm
- expectation-maximization
- internet
- iterative algorithm
- keyword labeling
- keyword matching
- laplace smoothing
- learning
- learning algorithm
- learning algorithms
- learning techniques
- likelihood estimation
- machine learning
- machine learning techniques
- matching
- maximum likelihood
- maximum likelihood estimation
- naive bayes
- naive bayes classifier
- nlp
- parameter estimation
- parametric estimation
- perl script
- search
- search engine
- search engines
- sense disambiguation
- sense disambiguation task
- smoothing
- statistical technique
- text classification
- text classifier
- topic detection
- topic detection and tracking
- word sense disambiguation
Other assigned terms:
- anchors
- approach
- class hierarchy
- class probability
- classification accuracy
- classification error
- classification hierarchy
- computer science research
- data set
- dictionary
- disambiguation task
- disk
- distribution
- document
- document frequency
- estimation
- experimental results
- feature
- feature space
- generation
- generative model
- implementation
- intelligence
- interpolation
- keyword
- knowledge
- labeling
- large training
- leaf
- likelihood
- local maxima
- method
- prior probability
- probabilistic model
- probabilities
- probability
- probability estimate
- probability estimates
- process
- random sample
- seed
- segments
- sparse data
- sparse data problem
- technique
- term
- test set
- text
- text documents
- theoretical framework
- topics
- training
- training data
- training documents
- training examples
- uniform distribution
- unigram
- unigram model
- unlabeled examples
- vocabulary
- web page
- web pages
- word
- word distribution
- word features
- word sense
- words