Advanced Cascading
We’ll cover error handling with Traps, optimizing Flows, creating custom operations, best practices, effective use of SubAssemblies, and monitoring Flows.
Who Should Attend?
Java developers who want to learn more about how best to use Cascading to solve real-world data processing problems.
Prerequisites
This module assumes basic knowledge of Hadoop and Cascading. We recommend completing both our Introduction to Hadoop and Introduction to Cascading modules. Relevant work experience is also highly valuable, as students who arrive with real-world problems in hand will benefit from the instructor’s input on their specific issues.
Participants should be comfortable reading and writing Java code; familiarity with Bash will help.
Outline – 1/2 Day
- Custom Operations – Creating your own Functions, Filters and Buffers
- Hands-on Lab #1 – Extending Cascading
- Optimizing Workflows – Common Techniques
- Hands-on Lab #2 – Optimizations
- SubAssemblies & Cascades – Reusable Components, Reliable Workflows
- Failure Traps – How to Handle Bad Data
- Debugging & Monitoring Workflows – Best Practices
- Hands-on Lab #3 – Trapping Bad Data, Modularizing a Workflow
- Summary
Outline – Full Day
- Custom Operations – Creating your own Functions, Filters and Buffers
- Custom Types – Beyond Primitive Types in Tuples
- Hands-on Lab #1 – Extending Cascading
- Hadoop Integration – Data Interchange, Streaming Jobs
- Optimizing Workflows – Common Techniques
- Optimizing Hadoop – Tuning Hadoop Job Settings
- Hands-on Lab #2 – Optimizations
- SubAssemblies & Cascades – Reusable Components, Reliable Workflows
- Failure Traps – How to Handle Bad Data
- Debugging & Monitoring Workflows – Best Practices
- Hands-on Lab #3 – Trapping Bad Data, Modularizing a Workflow
- Summary