Advanced Hadoop

This module covers topics that are commonly encountered by developers applying Hadoop to larger-scale, more complex real-world data processing problems.

We’ll cover extending Hadoop via custom input/output formats, how to implement the most common Big Data processing patterns using Hadoop, cluster performance monitoring & tuning, and best practices for testing Hadoop workflows.

Who Should Attend?

Java developers who want to learn more about how best to use Hadoop to solve real-world data processing problems.

Prerequisites

This course assumes basic knowledge of Hadoop. We recommend completing our Introduction to Hadoop course, or equivalent hands-on experience. Relevant work experience is also highly valuable, as students who arrive with real-world problems in hand will benefit from the instructor’s input on their specific issues.

Participants should be comfortable reading and writing Java code; familiarity with Bash will help.

Outline

Extending Hadoop – Custom partitioning, comparators, and input/output formats
Extending Hadoop Lab – Reading & processing a custom data format
Common Patterns – Filtering, sorting, binning and joining data sets
Common Patterns Lab – Implement a workflow that filters, sorts, and joins data
Monitoring – Best practices for making sure your jobs are running properly
Testing – How to write unit & integration tests to validate Hadoop workflows
Optimizations – Common causes of performance problems and how to fix them
Optimization Lab – Dramatically improve the performance of a typical workflow
Summary

Advanced Hadoop

Who Should Attend?

Prerequisites

Outline

The Latest

People Say

Company News

Dig Deeper

Interested?