Scale Unlimited/Cascading case study posted

September 2, 2011

We’re heavy users of the Cascading open source project, which lets us quickly build Hadoop-based workflows to solve custom data processing problems. Concurrent recently posted a Scale Unlimited Case Study that describes how we use Cascading, and the benefits to us (and thus to our customers). They also listed the various Cascading-related open source projects we sponsor, including the Solr scheme that makes it trivial to generate Solr search indexes more…

Comments are off for this post
Filed under: Uncategorized by kkrugler

Presenting at Strata Conference Tutorial on Hadoop

January 27, 2011

Tags: AWS, emr, hadoop

This coming Tuesday, Feb 1st I’ll be helping at the “How to Develop Big Data Applications for Hadoop” tutorial. My specific sections will cover the “why” of using Amazon Web Services for Hadoop (hint – scaling, simplicity, savings) and the “how” – mostly discussing the nuts and bolts of running Hadoop jobs using Elastic MapReduce. I’ll also be roaming the room during the hands-on section, helping out the attendees. I’m more…

Comments are off for this post
Filed under: Uncategorized by kkrugler

Hadoop User Group Meetup Talk

April 22, 2010

Tags: avro, cascading, elastic mapreduce, hadoop, public terabyte dataset, simpledb

Last night I did a presentation at the April Hadoop Bay Area User Group meetup, hosted by Yahoo. 250+ people in attendance, so the interest in Hadoop continues to grow. Dekel has posted the slides of my talk, as well as a (very quiet) video. My talk was on the status of the Public Terabyte Dataset (PTD) project, and advice on running jobs in Amazon’s Elastic MapReduce (EMR) cloud. As more…

3 comments so far
Filed under: Uncategorized by kkrugler

Scale Unlimited/Cascading case study posted

Presenting at Strata Conference Tutorial on Hadoop

Hadoop User Group Meetup Talk

Recent Blog Posts

Site Tags