They Rule!: a social visualization tool of the US ruling class

January 24, 2006Mauro Cherubini Leave a comment

They Rule aims to provide a glimpse of some of the relationships of the US ruling class. It takes as its focus the boards of some of the most powerful U.S. companies, which share many of the same directors. Some individuals sit on 5, 6 or 7 of the top 500 companies. It allows users to browse through these interlocking directories and run searches on the boards and companies. A user can save a map of connections complete with their annotations and email links to these maps to others. They Rule is a starting point for research about these powerful individuals and corporations.

We should have something like this for Italian politics. It would be fun to see everything around Mr. B.

Tags: information visualization, Italy, learning technology, maps, politics, social network analysis

MontyLingua: a commonsense enriched part of speech tagger

January 22, 2006Mauro Cherubini Leave a comment

MontyLingua is a free, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people’s names, places, events, dates and times, and other semantic information.

Tags: Python, tagging, text data mining

Testing Visual Information Retrieval Methodologies Case Study

January 19, 2006Mauro Cherubini Leave a comment

E. Morse and M. Lewis. Testing visual information retrieval methodologies case study: Comparative analysis of textual, icon, graphical, and “spring” displays. Journal of the American Society for Information Science and Technology, 53(1):28–40, 2002. [pdf]

————————-

Although many different visual information retrieval systems have been proposed, few have been tested, and where testing has been performed, results were often inconclusive. Further, there is very little evidence of benchmarking systems against a common standard. An approach for testing novel interfaces is proposed that uses bottom-up, stepwise testing to allow evaluation of a visualization, itself, rather than restricting evaluation to the system instantiating it. This approach not only makes it easier to control variables, but the tests are also easier to perform. The methodology will be presented through a case study, where a new visualization technique is compared to more traditional ways of presenting data.

Tags: clustering, graphical user interface, information retrieval, information visualization

Building a document map

January 19, 2006Mauro Cherubini Leave a comment

These days I am pretty busy working with Lorenzo on our super secret project on Context Network Graphs. On our work schedule we had a delay due to the fact that we were trying to find a decent way to show a document collection on a two-dimensional map. We started with an ordered list of documents with ranking values.

From this one-dimensional situation we had to develop a second dimension of information and I can now swear that was not easy. We choose to use triangulation and the biggest problem we fought was that some triangle did not close properly. This document demonstrate how to compute if three documents can be placed in a triangle.

To verify that I did a quick hack in Python that was showing some gaps in the circles formed between each couple of points (see picture below). This was fun. To find how to fix this was not fun at all. But finally …

Tags: clustering, hack, information retrieval, map algorithms, maps, Python

meX-Search: a meta search engine

January 16, 2006Mauro Cherubini Leave a comment

meX-Search is a meta search engine that automatically categorizes search results into thematic groups and displays them by intuitive and interactive maps.

meX-search is an experimental, non commercial meta search engine built up from april to july 2004 by Karsten Knorr during his diploma thesis in computer and media science [University of Applied Science Berlin]. The main idea of the thesis was the implementation of an intuitive and simple user interface for web clustering search engines.

Users of conventional Web search engines are often forced to sift through a long list of off-topic documents to find relevant results… Especially when the search query is general, it is often hard to find relevant resources among thousands of irrelevant ones. Search result clustering is a approach to handle such problems by grouping similar documents among search results into thematic groups.

meX is a meta search engine. Currently meX is getting the search results completely from the Yahoo-API.

The clustering of the result-snippets from Yahoo is based on Carrot2, an open source java framework for clustering textual data. Within the Carrot2 framework meX uses the Lingo Algorithm. The Authors of the Carrot2 framework and components: Dawid Weiss, Jerzy Stefanowski, Stanislaw Osinski.

Tags: clustering, google, graphical user interface, information retrieval, information visualization, maps, search engine

Carrot2: a clustering framework

January 16, 2006Mauro Cherubini Leave a comment

Carrot2 is a research framework for experimenting with automated querying of various data sources (such as search engines), processing search results and their visualization.

Under the term “research”, we understand that the architecture of the system is oriented mostly toward flexibility, sometimes at a price of performance losses. Mechanisms such as data exchange via XML language, dynamically loaded components accessible via HTTP protocol, the use of Java as primary language of implementation — they all make the system very easy to tailor to one’s needs. Carrot2 was primarily built with search results clustering in mind, but it can be easily configured to do other, interesting things.

Tags: clustering, information retrieval, machine learning, search engine

Coal, China, and India: A Deadly Combination for Air Pollution?

January 14, 2006Mauro Cherubini Leave a comment

I found this great portal of the World Watch institute, which is an independent research organization that works for an environmentally sustainable and socially just society by providing compelling, accessible, and fact-based analysis of critical global issues. The portal offers the access to a variety of publications of synthesis of research on environmental facts. Most of the publications are accessible with a small payment to sustain the activity of the institute. I think is a small price for the quality of the information they provide.

Browsing the site I found this article on the coal consumption projections for year 2010:

The rapid growth in coal use in China and India, where pollution controls are minimal, is adding to local and long-distance pollution. More than 80 percent of Chinese cities in a recent World Bank survey had sulfur dioxide or nitrogen dioxide emissions above the World Health Organization’s threshold.

Scientists have concluded that growing up in a city with polluted air is about as harmful to a person’s health as growing up with a parent who smokes. Although air pollution is concentrated in cities, it can move well beyond them: for example, acidic lakes in Scandinavia have been linked to pollution from factories in the United States. The World Bank projected that on average 1.8 million people would die prematurely each year between 2001 and 2020 because of air pollution.

Brain Images Pubs Vs Vsow 2005 Fossil Consumption

Tags: ecology, environment, politics, society, statistics

Noise Sensitive Table

January 12, 2006Mauro Cherubini Leave a comment

These days Jean-Baptiste is working on the prototype of the Noise Sensitive Table. The idea is that this desk should react to the users voice offering a feedback on their turn-taking and collaborative processes. Here are some shoots I took in preview.

Tags: graphical user interface, human computer interaction, information visualization, interactive furniture, ubiquitous computing, tangible interface, usability

Civiltà e Animali

January 11, 2006Mauro Cherubini Leave a comment

Il grado di civiltà di un popolo si misura dal modo in cui tratta gli animali.

Mohandas K. Gandhi

Combine: an open source crawler

January 10, 2006Mauro Cherubini Leave a comment

Combine is an open system for crawling [harvesting and threshing (indexing)] Internet resources. The name is derived from the combine-harvester since the two perform their jobs in a similar way.

The Combine was initially developed as a part of the Development of a European Service for Information on Research and Education (DESIRE) project, which was funded by the European Commission within Telematics for Science Program.

It is later beeing modified for focused crawling by integrating the automated topic classification algorithms also developed in DESIRE with the crawler. This work is funded by Vinnova, Swedish Agency for Innovation Systems (project P22504-1 A) and the EU project ALVIS project.

Tags: google, information retrieval, open source, search engine

Mauro Cherubini

Professor at the University of Lausanne, Switzerland

Uncategorized