M. W. Berry, S. T. Dumais, and T. A. Letsche. Computational methods for intelligent information access. In Supercomputing ’95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing, page 20, San Diego, California, USA, 1995. ACM Press. [pdf]
This paper presents a detailed introduction to the Latent Semantic Indexing method, detailing the mathematical foundations and a visual example of the idea behind the singular value decomposition which is at the core of the method.
The main assumption of LSI is that there is some underlying or latent tructure in word usage that is partially obscured by variability in word choice. A truncated singular value decomposition is at the core of the method used to estimate the structure in word usage acreoss documents. Retrieval is then performed using the database of singular values and vectors obtained from the truncated SVD.
Mainly SVD reveals important information about the structure of a matrix, sparsifying minor differences in terminology that are at the base of synonymy and polisemy, a plague of information retrieval.
The paper also details the computational costs required to maintain up to date the matrices at the base of the method.