Hadoop User Group Meetup Talk

April 22, 2010

Last night I did a presentation at the April Hadoop Bay Area User Group meetup, hosted by Yahoo. 250+ people in attendance, so the interest in Hadoop continues to grow. Dekel has posted the slides of my talk, as well as a (very quiet) video. My talk was on the status of the Public Terabyte Dataset (PTD) project, and advice on running jobs in Amazon’s Elastic MapReduce (EMR) cloud. As more…

SimpleDB Tap for Cascading

March 16, 2010

Recently we’ve been running a number of large, multi-phase web mining applications in Amazon’s EC2 & Elastic MapReduce (EMR), and we needed a better way to maintain state than pushing sequence files back and forth between HDFS and S3. One option was to set up an HBase cluster, but then we’d be paying 24×7 for servers that we’d only need for a few minutes each day. We could also set more…