ACL RD-TEC 1.0 Summarization of W04-2013
Paper Title:
WORDNET-BASED TEXT DOCUMENT CLUSTERING
WORDNET-BASED TEXT DOCUMENT CLUSTERING
Authors: Julian Sedding and Dimitar Kazakov
Primarily assigned technology terms:
- algorithm
- categorisation
- classification
- clustering
- clustering algorithm
- data representation
- database
- disambiguation
- document clustering
- hierarchical clustering
- k-means
- manual sense disambiguation
- morphology
- partitional clustering
- pos tagger
- pos tagging
- preprocessing
- pruning
- sense disambiguation
- splitting
- stopword removal
- tagger
- tagging
- vector space model
- weighting
- word-sense disambiguation
Other assigned terms:
- ambiguity
- annotation
- annotators
- approach
- background information
- background knowledge
- bag of words
- case
- category size
- cluster
- cluster centroid
- clusters
- community
- concept
- concepts
- corpora
- cosine distance
- data set
- dictionary
- distribution
- document
- document frequency
- document length
- document vector
- document vectors
- domain-specific corpora
- entropy
- estimation
- fact
- human annotators
- hypernym
- interpretation
- inverse document frequency
- knowledge
- lexical database
- meaning
- meanings
- measure
- measures
- method
- noise
- nouns
- ontology
- part-of-speech
- part-of-speech tag
- pos information
- pos tag
- precision
- probability
- pruning threshold
- representations
- reuters corpus
- right-hand side
- semantic
- semantic distance
- senses of a word
- similarity measure
- standard deviation
- stem
- stems
- synonyms
- synonymy
- synset
- syntactic information
- tags
- term
- term frequency
- terms
- test collection
- test corpora
- test corpus
- test data
- text
- text structure
- tokens
- topics
- vector space
- verb
- vocabulary
- word
- word classes
- word senses
- wordnet
- wordnet senses
- words