ACL RD-TEC 1.0 Summarization of W02-2008

Paper Title:
A VERY VERY LARGE CORPUS DOESN'T ALWAYS YIELD RELIABLE ESTIMATES

Authors: James R. Curran and Miles Osborne

Other assigned terms:

  • case
  • computational complexity
  • content words
  • convergence
  • corpora
  • corpus size
  • discourse
  • distribution
  • estimation
  • events
  • fact
  • function words
  • hapax legomena
  • idiomatic expressions
  • language data
  • language model
  • language models
  • large corpora
  • large corpus
  • large training
  • large training corpora
  • linguistic
  • maximum decay ratio
  • measure
  • names
  • natural language
  • nlp tasks
  • noise
  • nouns
  • penn treebank
  • probabilities
  • probability
  • probability estimate
  • probability estimates
  • process
  • proper names
  • ptb
  • relative frequency
  • reuters corpus
  • sentence
  • statistical models
  • statistics
  • style
  • system performance
  • terms
  • text
  • text corpus
  • theorem
  • theory
  • training
  • training corpora
  • training corpus
  • training material
  • training set
  • treebank
  • uniform distribution
  • unigram
  • unigram model
  • unigram probability
  • word
  • word classes
  • word corpus
  • words

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***