I completed the script that performs the sequences extraction from the patterns of usage in the visual information retrieval experiment. The first matrix is the cumulative representation of the LSI method across the different tasks. The second matrix is, on the contrary, CNG. To understand the meaning of the cells the following key is needed: the columns are the events at time T+1, while the rows at time T. The first element is the case A, then B then C and D. These states are so summarised:
A: The user read a document and then select a document in the relative neighborhood of the first document;
B: The user read a document and then moves away from the cluster of the first document;
C: The user selects and stays in the cluster;
D: The user selects and moves away from the cluster.
[19, 132, 0 , 0]
[83, 588, 10, 90]
[4 , 49 , 0 , 0]
[14, 104, 0 , 2]
[5 , 88 , 0 , 1]
[48, 537, 17, 98]
[2 , 49 , 0 , 0]
[6 , 109, 1 , 2]
Next step is to verify whether these frequencies are different from the expected. On a first sight it will seem that the most frequent move is the ‘read + long jump’. Now we need to check whether this difference is significative.
P.S. To verify the move within the cluster we used the sequence determined by the Minimal Spanning Tree, which is extremely selective in defining the “in-cluster” movements.