Large scale analytics using Hadoop and Solr

April 3, 2013

I finally got around to posting the slides from last year’s talk I gave at Hadoop Summit.

The focus of the presentation was about how we used Hadoop & Solr to solve a big data analytics problem for one of our clients.

They have a web site that helps advertisers target publishers/networks and improve ad results by analyzing millions of web pages every day. They were able to cut monthly costs by more than 50%, improve response time by 4x, and quickly add new features by switching from a traditional DB-centric approach to one based on Hadoop & Solr. This analysis is handled by a complex Hadoop-based workflow, where the end result is a set of unique, highly optimized Solr indexes. The data processing platform provided by Hadoop also enables scalable machine learning using Mahout.

This presentation some of the unique challenges in switching the web site from relying on slow, expensive real-time analytics using database queries to fast, affordable batch analytics and search using Hadoop and Solr.

Comments are closed.