Bay Area Hadoop User Group talk

September 3, 2011
Tags:

Last week I have a talk at the August HUG meetup on my current favorite topic – using search (or rather, Solr as a NoSQL solution) to improve big data analytics.

It’s the same general theme I covered at the Basis Technology conference in June – Hadoop is often used to convert petabytes of data into pie charts, but without the ability to poke at the raw data, it’s often hard to understand and validate those results.

In the good old days of small data, you could pull out spreadsheets and dive into the raw data, but that’s no longer feasible when you’re processing multi-terabyte datasets.

Solr provides a way to query data efficiently, using it as a poor man’s NoSQL key-value store. Using something like the Cascading Solr scheme we created, it’s trivial to generate a Solr index as part of the workflow. And setting up an on-demand Solr instance is also easy, so you once again have the ability to see (query/count/inspect) the data behind the curtain.

Comments are closed.