A. Lagger. Distances sémantiques et extraction de thèmes. Semester project report, CRAFT – Ecole Polytechnique Fédérale de Lausanne, Ecoublens, Station 1, CH-1015 Lausanne, Switzerland, 2005.
This report present the semester work on the theme of semantic mining carried out at CRAFT. The goal was, given a certain corpus of texts, to define the semantic similarity of the text and extracting the relevant themes.
Two different algorithms have been implemented and tested. The first is the Condensation Clustering Value (CCV) described by Wise, 1999. The second was the Term Frequency / Inverse Document Frequency (TF/IDF).