Presenting at Strata Conference Tutorial on Hadoop

January 27, 2011
Tags: , ,

This coming Tuesday, Feb 1st I’ll be helping at the “How to Develop Big Data Applications for Hadoop” tutorial. My specific sections will cover the “why” of using Amazon Web Services for Hadoop (hint – scaling, simplicity, savings) and the “how” – mostly discussing the nuts and bolts of running Hadoop jobs using Elastic MapReduce. I’ll also be roaming the room during the hands-on section, helping out the attendees. I’m more…

Announcing the Public Terabyte Dataset project

November 1, 2009

We’re very excited to announce the Public Terabyte Dataset project. This is a high quality crawl of top web sites, using AWS’s Elastic MapReduce, Concurrent’s Cascading workflow API, and Bixo Lab’s elastic web mining platform. Hosting for the resulting dataset will be provided by Amazon in S3, and freely available to all EC2 users. In addition, the code used to create and process the dataset will be available for download more…