Training

Having trouble finding Big Data programmers? Why not train your own?

Scale Unlimited’s hands-on, in-person Apache Hadoop®, Cascading, Apache Solr™ and Amazon Elastic MapReduce training classes teach Java programmers everything they need to know to start solving Big Data problems, using lab exercises and real-world examples to reinforce lecture content.

Companies such as HP, Sun, Apple, eBay, Newegg, RIM, IBM, Lockheed, Deutsche Telekom and Nokia have benefited from our training – you can too.

Courses

We provide customized on-site training for groups of from 5 to 20 people. Our courses are customized for your needs, using any combination of four of our 1/2 day modules (see below) to create a class that is optimized for your particular use cases. We can also include a final 1/2 day “open lab” session, where students receive expert help on using what they’ve learned to solve their real-world problems.

These private classes provide an environment where students are free to discuss details of their use cases, leading to better comprehension and real results at the end of the class.

Modules

  • Big Data Tutorial: Get answers to questions about big data in general, and specific technologies used to solve big data problems.
  • Introduction to Hadoop: Principles of Hadoop development, operations & eco-system.
  • Advanced Hadoop: Extending Hadoop, common data processing patterns, debugging and tuning workflows, best practices for testing.
  • Amazon Elastic MapReduce: Using Amazon’s Elastic MapReduce service to quickly process big data for less money.
  • Big Data and Apache Solr: Data processing & indexing workflows for creating large-scale, high-performance search services.
  • Introduction to Cascading: Defining complex data processing workflows using the Cascading open source workflow API. This module is available in both 1/2 day and full day formats.
  • Advanced Cascading: Dive deeper into Cascading and learn about extending it via custom operations/taps/schemes, optimizing workflows, creating re-usable components, trapping bad data, and best practices for testing & debugging complex workflows. This module is available in both 1/2 day and full day formats.

Instructor

kkrugler-headshot Ken Krugler has been a software developer, consultant, trainer and entrepreneur for over 20 years. Previously he started Krugle in 2005, as a pioneer in code search and an early adopter/supporter of Apache Nutch, Apache Hadoop, Apache Lucene and Apache Solr. He is a committer for the Apache Tika project, a member of the Apache® Software Foundation, an author of one of the Lucene In Action use cases, and an expert in scalable data processing workflows, search, internationalization, web crawling and data mining.

Partners

Lucidwords Logo We partner with LucidWorks to provide in-depth training on using Solr with big data, for scalable end-to-end search and data analytics solutions.
White space  
DataStax Logo We partner with DataStax to teach developers what they need to know to succeed with Apache Cassandra.

Pricing

On-site training is $800 per student-day, plus travel expenses. The minimum class size is 5 for training located in the San Francisco Bay area, 10 for training in California, and 15 for training in other US-based locations.

Contact us if you are interested in training outside of the US.

Course Materials

Participants will receive an electronic copy of all slides and handouts, links to other resources and downloads, and access to post-class support via a moderated mailing list.

Interested?

Fill out our contact form, or e-mail classes@scaleunlimited.com.