ACL RD-TEC 1.0 Summarization of W02-2008
Paper Title:
A VERY VERY LARGE CORPUS DOESN'T ALWAYS YIELD RELIABLE ESTIMATES
A VERY VERY LARGE CORPUS DOESN'T ALWAYS YIELD RELIABLE ESTIMATES
Authors: James R. Curran and Miles Osborne
Primarily assigned technology terms:
Other assigned terms:
- case
- computational complexity
- content words
- convergence
- corpora
- corpus size
- discourse
- distribution
- estimation
- events
- fact
- function words
- hapax legomena
- idiomatic expressions
- language data
- language model
- language models
- large corpora
- large corpus
- large training
- large training corpora
- linguistic
- maximum decay ratio
- measure
- names
- natural language
- nlp tasks
- noise
- nouns
- penn treebank
- probabilities
- probability
- probability estimate
- probability estimates
- process
- proper names
- ptb
- relative frequency
- reuters corpus
- sentence
- statistical models
- statistics
- style
- system performance
- terms
- text
- text corpus
- theorem
- theory
- training
- training corpora
- training corpus
- training material
- training set
- treebank
- uniform distribution
- unigram
- unigram model
- unigram probability
- word
- word classes
- word corpus
- words