Mining massive document collections by the WEBSOM method

K. Lagus, S. Kaski, and T. Kohonen. Mining massive document collections by the websom method. Information Sciences, 163(1-3):135–156, 2004. [url]


This article present a summary of the research on the WEBSOM method, which is a viable salternative to the traditional text-mining methods. The idea behing the WEBSOM is to use a visual two-dimensional display to help browsing the document collections. In such maps, documents are displayed using representative keywords. The distance between these keywords is related to the similarity between the linked documents.

The authors present in the paper a review of the methods currently used for encoding the documents statistically in such a way that they can be represented using the WEBSOM. Some of them include: the vector space method, methods for dimensionality reduction, latent semantic indexing, random projection, word clustering, rapid initialization increasing map size, etc.

Additionally, the authors offer som other computational shortcut of the usage of their method: parallelized batch map algorithm, etc.

The paper contains also an evaluation of the new method with the traditional SOM algorithm, where they used two performance indices: the average quantization error and the classification accuracy.

Websom Map

Tags: ,

Leave a Reply