Found some effects in the information retrieval expreriment

April 26, 2006Mauro Cherubini Leave a comment

Today I spent some more time exploring the Visual Information Retrieval experiment. Apparently, the users scored better using CNG for the second task (see the picture below). This task was much more complex than the first one because of the smaller number of correct results in the dataset (5 out of 1000). Surprisingly I found also another effect, the time required to select the relevant results. It seems that the users spent less time selecting the relevant results in the first task using CNG.

Discussing this results with my advisor, we agreed that the results I have are interesting but not sufficient for explaining the effects and to decide on which method gives better results.

Some other explorations are needed: I have to use the Wilcoxon test for repeated measure for binding the results of the first task with that of the second. Secondarily it will be interesting to compute the average distance between consequent read items for each algorithm. This might be a nice parameter to represent the ability of the algorithm to group together relevant or irrelevant results.

Google Maps now covers all europe: Yipeeh!

April 25, 2006Mauro Cherubini Leave a comment

Finally the technological choice I operated some time ago now start to make fully sense. Now it is possible to use STAMPS everywhere in Europe with a decent degree of details. In the picture below, Neuchatel, in Switzerland. So, please join the first trial: shoutspace [at] gmail.com !

Tags: new technology

Further analysis of the Visual Information Retrieval experiment

April 25, 2006Mauro Cherubini Leave a comment

Ok, so far the analysis on the map interaction shows that there is no huge difference in the way the users interact with the map. LSI seems to spread the results all over the available space. On the contrary, CNG returns rankings which are much wider than LSI, with the resulting effect of pushing the points of the map on the top part of the quadrant.

The interesting fact which emerges is that CNG offers the relevant points much closer to the query origin, in the central lower part of the map. The subsequent analysis of the items selected shows exactly this point: while LSI mixes the most relevant items in the jungle of the other results, CNG keeps them much closer to the query origin in a space that is less populated by other results (see the map below).

So, where to go next? Two ideas so far: (a) Mapping the final selected points to see whether they are more dispersed with CNG or LSI; (b) Trace the interaction trails to assess if there are common patterns of exploration of the points in the map.

Apart from this extra analysis on the map then I am more concerned with the numerical results from the experiment. Our goal is always to check whether there is a difference between CNG and LSI in the user experience. Is there any outscore of one of the two methods concerning: (1) relevant results selected; (2) time needed to select the relevant results; (3) user perception of appropriateness; (4) number of query composed; (5) average time spent for each query; …

Tags: information metric, information retrieval, information visualization, interaction design, map algorithms, maps, data mining

heatmap of the visual information retrieval experiment – part 3

April 24, 2006Mauro Cherubini Leave a comment

Finally I corrected a couple of bugs in the script that I was using for the visualization. Then I spent a considerable amount of time to try to overlay the obtained heatmap to the original results map I obtained before. PIL is a great resource in this sense. I found a couple of hacks that were pointing in that direction [1], [2]. Unfortunately, they did not work for my case.

Finally I decided to give a try to the embedded function called blend. It worked well for my situation.

So what I am doing in the script is to:

1. divide the map into a certain number of cells;

2. count the frequency with which the items pile in each cell;

3. export the obtained matrix in csv (comma separated values);

4. open this file with R via RPy in the script;

5. process this file so to obtain the wanted heatmap;

6. save the heatmap in an external file;

7. process the heatmap with PIL so to make it right for the other images;

8. blend the map over the other existing images.

The part of the code that do the “interesting” part of this is reported below. The results is showed in the pictures below.

Tags: clustering, map algorithms, maps

Continue reading →

Supporting Language Learning Communities Using Mobile Blogs

April 24, 2006Mauro Cherubini Leave a comment

S. A. Petersen, G. Chabert, and M. Divitini. Supporting language learning communities using mobile blogs. In Proceedings of IADIS Mobile Learning’06, Dublin, Ireland, 14-16 July 2006.

———–

Communities are important for language learners to learn and practice the language. A classroom provides a sense of community for a group of students learning a language. We aim to extend the learning arena outside of the classroom and maintain the sense of community while the students are mobile. We propose the use of a mobile community blog to encourage collaboration among the students and to bridge the disconnection that is caused when some of the students travel abroad to improve their language. We discuss considerations that have been taking into account in adapting standard blog functionalities to support a community of mobile language learners.

Tags: mobile learning, new technology, virtual online communities

heatmap of visual information retrieval experiment

April 21, 2006Mauro Cherubini Leave a comment

After a couple of hacks to make my python installation work with R, I finally managed to export an heatmap of the cereals task. The map is upside down, but shows in essence that the most populated part of the map is the upper-central.

The plan is to do the same for the number of items selected in relation to the total in the spot. This might show a different trend of the usage. Maybe parts of the map that have been “explored” less were the most important in terms of the number of correct items retrieved in that spot. We will see…

Useful links:

[1] A nice tutorial on how to draw an heatmap with Python and R;

[2] Draw a Heat Map, the R manual page.

Tags: clustering, information visualization, map algorithms, maps

STAMPS in the press

April 20, 2006Mauro Cherubini Leave a comment

The journal “La Tribune de Genève” published today an article on STAMPS and the field trial we are organising. The article spans on the theme of location based services and why is so interesting to know where we are. The journalist, Emmanuel Grandjean, reports a few quotes of mine that I made during the interview. One of the main claims I raised is that communication is more and more fluid and LBSs add a new dimension to this flexibility.

Also, that communication tend to be maximally efficient. So, knowing where our partner is located can already prevent lots of different inferences on what is possible to communicate and how with our partner being in that location, and with us being in our location.

Tags: field tools, location awareness, Location Based Services, mobile learning, Symbian

building an heatmap of the map usage

April 19, 2006Mauro Cherubini Leave a comment

Back from Easter holidays I discussed with Patrick this cumulative map representation. The goal is to make visible the part of the map that attracted more of the user’s attention or time. For this reason I was trying to count the number of items that fall into a sub-area of the map; how many of these items were read by the users and how many selected by the users.

One of the possible visualization strategy that we discussed is the idea of using ‘heatmaps‘ a kind of bi-dimensional gradient that can highlight hot spots on the map (they are particularly used in cognitive science to represent eye tracking).

Another idea that we discussed briefly is that of showing the sequence of action on the map. In this way will be eventually possible to detect if there are common pattern of usage among the users (i.e., how many moved from square 1 to square 2?, and so on).

For the moment I am stuck with a minor visualization bug …

Tags: clustering, information visualization, map algorithms, maps, spatial clustering, statistics

Cumulative density map of the Virtual Retrieval experiment

April 14, 2006Mauro Cherubini Leave a comment

I have been pocking around to generate a visualization that can help understand how the users interacted with the map in our virtual retrieval experiment. It was fun to discover that there were some zones more used than others in the map, namely the central upper quadrant and the central left and right quadrants. The image below shows the density of visualization of results.

The map has been subdivided into 9 quadrants. In each quadrant is possible to see three big numbers. The upper one is the number of results that fall into the sector (even if you see few points there might be super impositions between similar queries). The lower-left big number in blue is the number of the items that have been read by the users. The red is the number of items that have been selected by the users in that quadrant.

Tags: information metric, information retrieval, information visualization, Latent Semantic Analysis, map algorithms, maps, search engine, spatial clustering, spreading activation

Visual Retrieval Experiment

April 6, 2006Mauro Cherubini Leave a comment

Dear Reader,

for the development of my thesis’ work I prepared, together with Lorenzo Viscanti, an experiment on Visual Information Retrieval. I ask your participation in this experiment that will last not more that 15/20 minutes.

If you agree, you will need to visit the following web page: http://www.noosfactory.com/visual%5Fir/ . There you will find some more information on the assignment: you will be presented two tasks, each of which will last 5 minutes. Each task will require you to run some queries on a collection of news articles compiled by Reuter during the ’80s. Finally, you will be posed 4 brief questions.

Please note that there will be an extra 5 minutes to download the application that you will use to participate in the task. This time may vary depending on your internet connection speed.

Tags: information retrieval, Latent Semantic Analysis, map algorithms, spatial clustering, spreading activation

Mauro Cherubini

Professor at the University of Lausanne, Switzerland

Uncategorized