Some consideration about clustering and information retrieval

In a recent conversation with Lorenzo Viscanti, I was raising the following points:

1. Using a supervised method for clustering can bias the results you get. As far as I know, every supervised method is based on a training set that has been prepared by an expert or that is taken from a specific and known context. This may yield some conflict witht the datasets because: a- the training set is epistemologically different from the dataset; b- the expert is not generalizable.

2. The definition of what a good cluster is cannot be mathematically determined. I partially agree on this. It is known that in Information Retrieval exists a bounch of methods for proofing the efficiency and efficacy of the retrieval functionalities.

However, the definition of what is a good cluster is based partially on human perception, which does not follow a proper logic. A couple of technique are therefore possible to apply, namely from cognitive psychology, to translate the variability of human cognition into quantitative data that is possible to comment on.

Tags: , ,

Leave a Reply