Combine: an open source crawler

Combine is an open system for crawling [harvesting and threshing (indexing)] Internet resources. The name is derived from the combine-harvester since the two perform their jobs in a similar way.

The Combine was initially developed as a part of the Development of a European Service for Information on Research and Education (DESIRE) project, which was funded by the European Commission within Telematics for Science Program.

It is later beeing modified for focused crawling by integrating the automated topic classification algorithms also developed in DESIRE with the crawler. This work is funded by Vinnova, Swedish Agency for Innovation Systems (project P22504-1 A) and the EU project ALVIS project.

Tags: , , ,

Leave a Reply