Focused web crawling

June 18, 2010
Tags:

Recently some customers have been asking for a more concrete description of how we handle “focused web crawling” at Bixo Labs.

After answering the same questions a few times, it seemed like a good idea to post details to our web site – thus the new page titled Focused Crawling.

The basic concepts are straightforward, and very similar to what we did at Krugle to efficiently find web pages that were likely to be of interest to software developers. In Bixo Labs we’ve generalized the concept a bit, and implemented it using Bixo and a Cascading workflow. This gives us a lot more flexibility when it comes to customizing the behavior, as well as making it easier for us to work with customer-provided code for extension points such as scoring pages.

Comments are closed.