Courses

Hadoop Boot Camp

Our Hadoop Boot Camp is the fastest, most effective way to turn Java programmers into Hadoop developers.

In this two-day course participants will receive an extensive overview of the Hadoop system architecture. Once completed, you will be able to identify which problems Hadoop can solve effectively, and be proficient in developing scalable Hadoop applications for processing big data.

We also offer an optional third day, with more in-depth/hands-on coding exercises to better assimilate concepts learned over the first two days. During this third day you’ll receive one-on-one help to get you past the conceptual hurdles of map-reduce and on to writing real code to solve real problems.

Instructor

Ken Krugler Ken Krugler has been a software developer, consultant, trainer and entrepreneur for over 20 years. His Bixo Labs web mining company uses Hadoop, Cascading, and Bixo to solve large-scale web mining and data analysis problems. Previously he started Krugle in 2005, as a pioneer in code search and an early adopter/supporter of Nutch, Hadoop, Lucene and Solr. He is a committer for the Apache Tika project, an author of one of the new Lucene In Action use cases, and an expert in web crawling and data mining.

Classes

Hadoop Boot Camp classes are taught at regular intervals in major cities around the United States. A complete list of upcoming classes can be found on the events page. In addition, we provide in-house training for corporate customers with at least 7 students.

Our next class is in Boston, MA on September 29th - reserve your spot today!

Agenda

The two-day course is divided into the following general sections, each of which alternates lecture with hands-on development. We also customize the final afternoon, selecting from a set of optional modules to best match the needs and interests of each class. During the optional third day we dive deep into coding solutions for a wide range of big data problems.

Hadoop Conceptual and Physical Architecture

Learn the how and why of Hadoop through a thorough discussion of the Hadoop conceptual and physical architecture. Students will also learn to configure Hadoop for all three modes of operation.

  • Execution layer: divide & conquer, speculative execution, processing fidelity.
  • Storage layer: fault tolerance, replication.
  • Managing jobs: Java interfaces, monitoring tools.
  • Runtime: execution modes, configuration, job execution.

Thinking in MapReduce with Common and Advanced Patterns

Understand the principles behind MapReduce and how to implement common MapReduce patterns. Advanced patterns are also introduced in depth.

  • Divide & conquer parallelism.
  • Typical patterns: filtering, parsing, counting, binning, distributed tasks, sorting, chaining.
  • Advanced patterns: multi-field grouping, secondary sort, co-grouping, stable distributed sort.
  • Problem modeling: good & bad use cases for Hadoop, modeling machine learning.

Hadoop API

Review the core Hadoop API interfaces and classes. This will be reinforced through the development of increasingly complex Hadoop applications.

  • Key classes: Mapper, Reducer, Input/OutputFormat, Writable.
  • Key formats: Text, Sequence, Map files.
  • Job management: Job, Configuration, JobClient.

Architectural Details

Dig into the guts of Hadoop, to understand cluster configuration and deployment.

  • Physical architecture: HDFS name & data nodes, job scheduler, task tracker.
  • Cluster runtime: distributing configuration, management scripts, Web GUI.
  • Map-reduce details: splits, mappers, shuffling, reducing.

Alternative Interfaces

Discuss high-level interfaces for Hadoop and MapReduce.

  • Nosql: Hive, Pig.
  • Key-store: Cassandra, HBase, Voldemort.
  • Workflow: Cascading, Oozie.

In-depth Exercises (day 3)

Write, debug, and review solutions to common and complex tasks suitable for Hadoop, including:

  • N-way merge of data from heterogeneous data sources.
  • Applying machine learning algorithms (clustering, classification).
  • Analyzing large-scale networks (graphs).

Requirements

Please make sure you bring a laptop with:

  • 1 GB of RAM
  • USB 2.0 Port
  • Unix compatible OS/Shell (Linux, OpenSolaris, Mac OS X, Cygwin on Windows)
  • Java IDE (Eclipse or IntelliJ are best options)
 
What our customers say