Today I had a meeting with my advisor on the clustering algorithm, and I came out with this refined version of the procedure I am working on. The goal of this is still finding the similarities between messages attached to a shared map. Such kind of data are non-spatial, or textual. Therefore lots of different semantic techniques can be used to the end of clustering.
Of course, these messages can also be treated as geographical entities, for which another sets of methodologies are more acquainted. Still, I cannot find a reasonable way to merge together the semantic and the geographical techniques.
One one end of a continuum we have a similarities of keywords of messages that have no geographical meaning. On the other end we have distances of messages that are properly geographical. The goal is to understand how to use these two dimensions.
One of the idea we came out is this concept of running a geographical clustering several times, depending on a semantic threshold parameter, trying to minimize the clouds of different outcomes. The mathematical dimension of this is still fuzzy.