Wednesday, August 15, 2012

Web crawler

Web crawler

A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion.

Yahoo! Slurp is the name of the Yahoo Search crawler.
Bingbot is the name of Microsoft's Bing webcrawler. It replaced Msnbot.
FAST Crawler is a distributed crawler, used by Fast Search & Transfer, and a general description of its architecture is available
Googlebotis described in some detail, but the reference is only about an early version of its architecture, which was based in C++ and Python. The crawler was integrated with the indexing process, because text parsing was done for full-text indexing and also for URL extraction. There is a URL server that sends lists of URLs to be fetched by several crawling processes. During parsing, the URLs found were passed to a URL server that checked if the URL have been previously seen. If not, the URL was added to the queue of the URL server.

Open-source crawlers

Aspseek is a crawler, indexer and a search engine written in C++ and licensed under the GPL
DataparkSearch is a crawler and search engine released under the GNU General Public License.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)