Finally I can write a RIP (Repose In Peace) on the Virtual Information Retrieval experiment. In the last strategy I was trying to detect possible differences between the two algorithms (LSI and CNG) to support the user in exploring the results of the query. Finally I found an effect when using the relative distance index between jumps.
Basically, when the user explores the results in the map, s/he select items one at a time. Our initial hypothesis was that the distance between each ‘jump’ was a function of the pertinence of the document (either detected by the user or as an absolute value).
We found that this hypothesis was not verified when looking at the distances from an absolute point of view, as the documents are not distributed equally in the map. On the contrary, if we look at these distances from a relative point of view (number of closer documents compared to the next document chosen), then we have a different overview of what happened.
It happens in fact that CNG is more efficient in splitting good results from the rest of the documents returned from the query. We have seen this through the heatmap, as LSI spreads the results all around. Subsequently, the user is more able to identify the good results as they are well isolated in the bottom part of the map. As the user moves from that selection the absolute distance increases a lot but the relative distance stays low. And certainly lower that what the user does using LSI (see test below).
> wilcox.test(distance_rel ~ as.factor(method))
Wilcoxon rank sum test with continuity correction
data: distance_rel by as.factor(method)
W = 423460.5, p-value = 2.271e-12
alternative hypothesis: true mu is not equal to 0
In sum, we could not see an improvement of the performances as the task was maybe too difficult but we could register an interesting effect in the user strategy that reflects on the ability of each method to cluster the results in a way that facilitated the user.