Big Data and Solr

This module shows how to apply the processing power of Hadoop to common data processing challenges encountered while creating Solr indexes.

We’ll look at common use cases for generating search indexes from big data, typical patterns for the data processing workflow, and how to make it all work reliably at scale.

We will explore in detail an example of processing web crawl results to create a faceted Solr search solution.

And you’ll learn how Solr can be used as a NoSQL solution, and how it compares to classic NoSQL projects such as Cassandra and HBase.

Who Should Attend?

Developers who need to process data at scale, where the end result is an index of the data suitable for search and/or data analytics.

Prerequisites

To get the most from this course you should have experience with Java, Hadoop, and developing Solr applications. We recommend completing both our Introduction to Hadoop module and LucidWork’s “Developing Search Applications with Solr” course. Relevant work experience is also highly valuable, as students who arrive with real-world problems in hand will benefit from the instructor’s input on their specific issues.

Participants should be comfortable reading and writing Java code; familiarity with Bash will help.

Outline

Overview – Generating Solr Indexes with Hadoop
Workflows – Connecting Big Data to Solr
Indexing – How to Quickly Build Big Indexes
Hands-on Lab – Generating a Word Co-occurrence Index
Data Analysis – Preparing Data for Solr
NoSQL – Using Solr as a Scalable Database
Big Data Example – 1 Billion Records in Solr
Summary

Big Data and Solr

Who Should Attend?

Prerequisites

Outline

The Latest

People Say

Company News

Dig Deeper

Interested?