Notes from the last meeting with M. Rajman

Notes from the last meeting with M. Rajman (30.5.2006)

Standard Information Retrieval techniques provide a solid treatment of computation of similarity and ranking based on well tested and accepted methodologies. In a multidimensional space, similarity between two points is computed as the angle between the vectors representing the two points. This is called Cosene Similarity.

Standard IR measures for ranking similarities use enhanced variants of the tf*idf formula called BM25 (a.k.a. “Okapi”) and Deviation From Randomness, DFR (a.k.a. “Prosit”).

One of the advices of Dr. Rajman was to rely on these standard Information Retrieval techniques instead of spending energies trying to implement a new ranking system. It is possible to use a standard platform for Information Retrieval as “Terrier” that allows to used these standard solutions.

As a second step in the discussion we talked about the Multidimensional Filtering. One of the objectives of the thesis, in fact, is the definition of a criterion for mixing different types of relevance that we are considering (i.e., semantic relevance; geographic relevance; popularity / social relevance; etc.). These different features are not compatible or comparable, so one of the ideas is to not try to mix them.

We can use a technique called sequence filtering which works on the principle that: “rejection is local and acceptance is global”. The principle is based on the idea that we should start filtering using the feature that discriminates the most, and then move to the next feature. In other words this method allows taking into account all the features without mixing them in a particular fashion.

As a third step in the discussion, Dr. Rajman illustrated his view on another big challenge on the acceptance of the relevant document. In fact, the ultimate step for defining the criterion of acceptance of a certain feature is the definition of the boundaries of acceptance.

In a continuum of distribution of relevance in the document set, we need to define how to define acceptance. This can be achieved on the Relevance axis using a Rmin boundary or on the document axis, using a k-best factor over the ordered distribution of relevance in the document set. A combination of these criteria is also possible. One of the big challenges for my thesis work would be to infer experimentally these bounding limits.

To summarise the challenges of my thesis work are: 1) the selection of the relevant features necessary for the retrieval process; 2) an user study to define the acceptance boundaries for the retrieval process; and 3) the final validation of these parameters through an experimental study.

Tags: , , ,

SHriMP: a visualization technique for exploring software architecture

SHriMP is both an application and a technique, designed for visualizing and exploring software architecture and any other information space. SHriMP (Simple Hierarchical Multi-Perspective) is a domain-independent visualization technique designed to enhance how people browse and explore complex information spaces. Among the applications we are actively exploring is the exploration of large software programs, and the understanding of complex knowledge-bases (via the Protégé tool).

Technorati Tags:

Olympic Peninsula, Port Townsend and Hurricane Ridge

This week end we visited the Olympic Peninsula, the huge piece of land in front of Seattle. It was a great experience. We took the ferry from Edmond to Kingston that was worth the price. Then we visited Port Gamble where we could participate in the reconstruction of the American civil war. After we left for Port Townsend a Victorian-style village on the north edge of the peninsula.

In Port Townsend we lodged at the Manresa Castle, home of the first mayor of the village and after center for religious studies of the Jesuit. We ate at Finns [review], a gorgeous restaurant on the seaside. The seafood was excellent. I had Crab cakes and Alaskan wild salmon: yum!

The biggest attraction of this w.e. though was Hurricane Ridge: an American National Park with its visitor center at 5200 foot of hight! The view was just astonishing. We could also hike a bit with the stroller.

Img 2474  Img 2492

Img 2427  Img 2428

Relo: Helping Users Manage Context during Interactive Exploratory Visualization of Large Codebases

Vineet Sinha, David Karger, Rob Miller, “Relo: Helping Users Manage Context during Interactive Exploratory Visualization of Large Codebases”, OOPSLA’05 Eclipse Technology eXchange (ETX) Workshop. Oct. 16-20, 2005, San Diega, California, USA. 5 pages. [pdf]
———-

As software systems grow in size and use more third-party libraries and frameworks, the need for developers to understand unfamiliar large codebases is rapidly increasing. In this paper, we present a tool, Relo, that supports developers understanding by allowing interactive exploration of code. As the developer explores relationships found in the code, Relo builds and automatically manages the context in a visualization, thereby helping build the developers mental representation of the code. Developers can group viewed artifacts or use the viewed items to ask Relo for further exploration suggestions. Relo is built as an Eclipse plug-in integrated into the Java Tooling (JDT), and uses a standard, RDF, based backend allowing for maintaining code relationships and performing inferences about the relationships.

7 rules for software start-ups

Ajit Nazare at Kleiner Perkins has 7 rules for software start-ups they consider funding:

– Instant Value to customers – solve a problem or create value with the first use
– Viral adoption – Pull, not push. No direct sales force required
– Minimum IT footprint, preferably none. Hosted SaaS is best.
– Simple, intuitive user experience – no training required.
– Personalized user experience – customizable
– Easy configuration based on application or usage templates
– Context aware – adjust to location, groups, preferences, devices, etc

[more…]

sketches and code story

While looking for a way to produce ASCII art of diagrams I ran into this nice story of why visualizations of code can give a fundamental help to programming (my highlights).

“If I can’t draw it, it’s probably because I don’t fully understand it”.

What prompted this article was one of those fortunate moments when you call on a collegue to help you out in a given situation (as fraught with danger as asking a collegue for help is) and you actually learn something useful. In this case I was struggling to untangle some logic that relied on 3 or 4 relatively unrelated variables in order to implement some specific business rules for my current employer.

So I called in a collegue and asked him to let me explain the logic to him so that he might help me untangle it. To assist me I drew a picture, and then another, and then another, before finally giving up and bringing up the relevant source code (what a cop out!). This at least allowed me to explain the complexity of the problem, and it was at that point that Ed (my collegue) said, “This needs a state diagram”.

To which I said, “Sounds good to me. Off you go then”. So as I explained, Ed drew the state diagram and within 10 minutes we had it all sorted out and found another logical bug to boot. Then I said, “That’s the fundamental problem with high level logic that relies for its input on the result of source code residing in a variety of different source files, its just about impossible to comment (or something equally succinct!).

“What would be great would be a way of putting diagrams like this in the code“. And then it all fell into place.

Tags: , ,

Snoqualmie falls

Some pictures from our last trip to the Snoqualmie falls (WA). The view was excellent. A nice trip of half an hour from Redmond. The close city of Snoqualmie is also nice. You can visit the North West Train Museum. Unfortunately, junk-food everywhere …. 🙁

Img 2283  Img 2279 2

Img 2308  Img 2321 2