ACL RD-TEC 1.0 Summarization of J01-1001

Paper Title:
USING SUFFIX ARRAYS TO COMPUTE TERM FREQUENCY AND DOCUMENT FREQUENCY FOR ALL SUBSTRINGS IN A CORPUS

Authors: Mikio Yamamoto and Kenneth W. Church

Other assigned terms:

  • alphabet
  • approach
  • array
  • case
  • characters
  • chinese characters
  • cluster
  • compact representation
  • compositionality
  • concordance
  • corpora
  • correlation
  • data structure
  • dictionaries
  • dictionary
  • discourse
  • discourse structure
  • distribution
  • document
  • document frequency
  • empty string
  • english corpus
  • english dictionary
  • english text
  • english translation
  • events
  • expository convenience
  • fact
  • foreign words
  • general vocabulary
  • grammar
  • heuristics
  • implementation
  • independence assumption
  • interpolation
  • interpretation
  • inverse document frequency
  • japanese corpus
  • japanese text
  • japanese word
  • kanji
  • katakana
  • keyword
  • large corpora
  • large corpus
  • lexicographer
  • lexicography
  • likelihood
  • linguist
  • linguistics
  • method
  • mutual information
  • n-gram
  • n-grams
  • names
  • natural language
  • nlp applications
  • noise
  • noisy channel
  • occurrence probability
  • perplexity
  • phrase
  • prepositions
  • probabilities
  • probability
  • procedure
  • process
  • processing time
  • recursion
  • ridf value
  • search space
  • statistical natural language
  • statistics
  • substring
  • suffix
  • suffixes
  • symbol
  • technical terminology
  • technique
  • term
  • term frequency
  • terms
  • text
  • tokens
  • trees
  • vocabulary
  • wall street journal corpus
  • word
  • word sequences
  • words
  • wsj corpus

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***