Today I spent some more time exploring the Visual Information Retrieval experiment. Apparently, the users scored better using CNG for the second task (see the picture below). This task was much more complex than the first one because of the smaller number of correct results in the dataset (5 out of 1000). Surprisingly I found also another effect, the time required to select the relevant results. It seems that the users spent less time selecting the relevant results in the first task using CNG.
Discussing this results with my advisor, we agreed that the results I have are interesting but not sufficient for explaining the effects and to decide on which method gives better results.
Some other explorations are needed: I have to use the Wilcoxon test for repeated measure for binding the results of the first task with that of the second. Secondarily it will be interesting to compute the average distance between consequent read items for each algorithm. This might be a nice parameter to represent the ability of the algorithm to group together relevant or irrelevant results.