Solr Analytics Presentation at Hadoop Summit Santa Clara 2012

One of our clients has a web site that helps advertisers target publishers/networks and improve ad results by analyzing millions of web pages every day. They have been able to cut monthly costs by more than 50%, improve response time by 4x, and quickly add new features by switching from a traditional DB-centric approach to one based on Hadoop & Solr.

This analysis is handled by a complex Hadoop-based workflow, where the end result is a set of unique, highly optimized Solr indexes. The data processing platform provided by Hadoop also enables scalable machine learning using Mahout. This presentation covers some of the unique challenges in switching the web site from relying on slow, expensive real-time analytics using database queries to fast, affordable batch analytics and search using Hadoop and Solr.