Term Frequency-Inverse Document Frequency. A kind of DocumentVector. This scheme assigns a weight to each term (vocabulary word) in a given document. The weight increases proportional to the number of times the term occurs in the document, but is offset by a term which devalues terms common in the overall corpus.
One formula (apparently a simplification of (Salton and Buckley, ’88)) is the following. The weight of a term t in a document D is:
(# of occurrences of term t in this document D) * log((total # of documents)/(# of documents with mention of term t))
References:
Gerard Salton , Christopher Buckley, Term-weighting approaches in automatic text retrieval, Information Processing and Management: an International Journal, v.24 n.5, p.513-523, 1988
Copyright notice: the present content was taken from the following URL, the copyrights are reserved by the respective author/s.