A. Kontosthathis, W. M. Pottenger, and B. D. Davison. Assessing the impact of sparsification on lsi performance. In Proceedings of the 2004 Grace Hopper Celebration of Women in Computing Conference, Chicago, IL, USA, Oct 6-9 2004. [url]
————————————–
This paper presents a technique for sparsification of LSI values in the Singular Value Decomposition. The algorithm implemented defines a positive and a negative threshold. Any element Tk in the Term by Dimension matrix that falls in the interval defined by the thresholds is set to zero. Results shown not significant losses in the precision and recall of the information retrieval.
ABSTRACT
We describe an approach to information retrieval using Latent Semantic Indexing (LSI) that directly manipulates the values in the Singular Value Decomposition (SVD) matrices. We convert the dense term by dimension matrix into a sparse matrix by removing a fixed percentage of the values. We present retrieval and runtime performance results, using seven collections, which show that using this technique to remove up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI), while reducing memory requirements and query response time. Removal of 90% of the values results in significantly reduced memory requirements and dramatic improvements in query response time. Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested.
Tags: clustering, information retrieval, Latent Semantic Analysis, Singular Value Decomposition