Skip to content

Courses

Scale Unlimited offers a full set of online courses that provide in-depth coverage of Hadoop, Cascading and Solr. All courses are 1/2 day interactive virtual classes led by instructors, with hands-on lab exercises, real-world examples & demonstrations to reinforce content.

Introduction to Hadoop

Learn how to solve big data problems with Hadoop. We’ll start with Hadoop from the ground up, and cover the Hadoop architecture, Distributed File System, MapReduce, writing & running Hadoop jobs, operations, and the Hadoop eco-system.

Students will learn how to create and run Hadoop jobs, what types of problems are good (and bad) candidates for Hadoop-based solutions, and alternatives to writing Hadoop code.

Who Should Attend?

This course is for Java developers who want to know how and when to use Hadoop to solve Big Data problems.

Prerequisites

This course assumes no prior knowledge of Hadoop, though participants should be comfortable reading and writing Java code; familiarity with Bash will help.

Schedule & Registration

Date Location Time Register
May 7 & 8, 2012 Boston, MA 9:00am-5:00pm Pacific Register (part of Lucid training during Lucene Revolution)
July 11th Redwood City 9:00am-5:00pm Pacific Register (part of 2 day Lucid training)

Advanced Hadoop

This course covers topics that are commonly encountered by developers applying Hadoop to larger-scale, more complex real-world data processing problems.

We’ll cover extending Hadoop via custom input/output formats, how to implement the most common Big Data processing patterns using Hadoop, cluster performance monitoring & tuning, and best practices for testing Hadoop workflows.

Who Should Attend?

This course is for Java developers who want to learn more about how best to use Hadoop to solve real-world data processing problems.

Prerequisites

This course assumes basic knowledge of Hadoop. We recommend completing our “Introduction to Hadoop” course, or equivalent hands-on experience. Relevant work experience is also highly valuable, as students who arrive with real-world problems in hand will benefit from the instructor’s input on their specific issues.

Schedule & Registration

Date Location Time Register
TBD Virtual Class 9:00am-1:30pm Pacific Registration pending

Hadoop and Elastic MapReduce

This course takes Hadoop developers through the ins-and-outs of leveraging Amazon’s Elastic MapReduce service to quickly process big data, with lower cost and less hassle. In ten modules and four hours we’ll take developers from n00b to knowledgeable.

Topics covered include:

  • Getting Started: Signing up, credentials, access control
  • Running Jobs: Using the AWS Console to define and run jobs
  • Clusters of Servers: EC2 instance types, Hadoop configuration
  • Dealing with Data: Ephemeral vs. S3 storage
  • Wikipedia Lab: Using Hadoop to process Wikipedia data
  • Command Line Tools: Beyond the Console – Ruby client, s3cmd
  • Debugging Tips: Alive clusters, logs in S3/SimpleDB
  • Hive and Pig: Running Hive and Pig jobs
  • Hive Lab: Using Hive to process server logs
  • Advanced Topics: Spot pricing, dynamic clusters

Who Should Attend?

This course is for Hadoop developers who want to learn how best to use Amazon’s Elastic MapReduce service.

Prerequisites

This course assumes basic knowledge of Hadoop. We recommend completing our “Introduction to Hadoop” course, or equivalent hands-on experience. Relevant work experience is also highly valuable, as students who arrive with real-world problems in hand will benefit from the instructor’s input on their specific issues.

Schedule & Registration

Date Location Time Register
TBD Virtual Class 9:00am-1:30pm Pacific Registration pending

Hadoop + Solr

This course shows how to apply the processing power of Hadoop to common data processing challenges encountered while creating Solr indexes.

We’ll look at common use cases for generating search indexes from big data, typical patterns for the data processing workflow, and how to make it all work reliably at scale. We will explore in-depth an example of processing web crawl results to create a faceted Solr search solution. You’ll learn how Solr can be used as a NoSQL solution, and how it compares to classic NoSQL projects such as Cassandra and HBase.

Who Should Attend?

This course is for Solr developers who want to know how to leverage the flexible search functionality of Apache Solr and the Big Data processing of Apache Hadoop, to create the indexes for both general search and augmented data analytics.

Prerequisites

To get the most from this course you should have experience with Java, Hadoop, and developing Solr applications. We recommend completing both our “Introduction to Hadoop” and Lucid Imagination’s “Developing Search Applications with Solr” courses. Relevant work experience is also highly valuable, as students who arrive with real-world problems in hand will benefit from the instructor’s input on their specific issues.

Schedule & Registration

Date Location Time Register
May 7 & 8, 2012, 2012 Boston, MA 9:00am-5:00pm Pacific Register (part of Lucid training during Lucene Revolution)
July 11th Redwood City 9:00am-5:00pm Pacific Register (part of 2 day Lucid training)

Introduction to Cascading

This course shows how to use the Cascading open source workflow API to create high performance, scalable, reliable and maintainable data processing solutions on top of Hadoop. We’ll do an overview of Cascading, how to “think in Cascading” to model problems using Cascading’s workflow graph approach, leveraging built-in operations, extending Cascading with custom operations, simple and complex grouping & joining, input/output using Taps and Schemes, and best practices.

Students will learn how to apply Cascading to a wide range of complex data processing problems.

Who Should Attend?

This course is for Hadoop developers who want to learn how to use Cascading to reduce development time (often by more than 75%), improve performance, and simplify complex data processing workflow development.

Prerequisites

To get the most from this course you should have experience with Hadoop. We recommend completing our “Introduction to Hadoop” course. Relevant work experience is also highly valuable, as students who arrive with real-world problems in hand will benefit from the instructor’s input on their specific issues.

Schedule & Registration

Date Location Time Register
TBD Virtual Class 9:00am-1:00pm Pacific Registration pending

Advanced Cascading

This course covers topics that are commonly encountered by developers applying Cascading to larger-scale, more complex real-world data processing problems.

We’ll cover error handling with Traps, optimizing Flows, creating custom Taps and Schemes, best practices, effective use of Subassemblies, and monitoring Flows.

Who Should Attend?

This course is for Java developers who want to learn more about how best to use Cascading to solve real-world data processing problems.

Prerequisites

This course assumes basic knowledge of Hadoop and Cascading. We recommend completing both our “Introduction to Hadoop” and “Introduction to Cascading” courses. Relevant work experience is also highly valuable, as students who arrive with real-world problems in hand will benefit from the instructor’s input on their specific issues.

Schedule & Registration

Date Location Time Register
TBD Virtual Class 9:00am-1:00pm Pacific Registration pending

Requirements

Due to the virtual environment used for labs, all participants should have a reliable, fast (1.5Mbps or better) Internet connection.

In addition, Adobe Connect requires an up-to-date web browser: Mozilla Firefox 3 or higher; Apple Safari 4 or 5; Google Chrome; or Microsoft Internet Explorer 7, 8, or 9.

Course Materials

Participants will receive an electronic copy of all slides and handouts, as well as links to other resources and downloads.

Cancellation Policy

Registration for a class can be canceled up to 14 calendar days in advance of the class date for either a full refund, or credit towards another class. No credit or refund can be given for no-shows, or class registrations canceled less than 14 calendar days prior to a class date. If a registered participant is unable to attend the course, a substitute is welcome to take their place.

On occasion, Scale Unlimited has to cancel or reschedule a delivery. If this happens, we will let you know at least one week before the start of a virtual classroom delivery. All participants will receive a full refund, or at their discretion a credit for a future class.