Talk on using search with big data analytics

July 8, 2011

A few weeks back I was at the Basis Technology Government Users Conference in Washington, DC. It was an interesting experience, meeting people from agencies responsible for processing lots of important data. One thing I noticed is that in the Bay area, your name tag at an event tries to convey that you’re working on super-cool stuff. Here in DC, it’s more cool to be classified. For example, name tags more…

Proposals for Big Data web mining talk

November 16, 2009

I’m going to be giving a talk at the Bay Area ACM data mining SIG in December, and I need to finalize my topic soon – like today 🙂 I was going to expand on my Elastic Web Mining talk (“Web mining for SEO keywords”) from the ACM data mining unconference a few weeks back. But the fact that I’ll have 10s to 100s of millions of web page data more…

Elastic Web Mining Talk

November 2, 2009

Here’s the presentation I gave at the ACM data mining unconference on elastic web mining – how to create scalable, reliable and cost effective web mining solutions using an open source stack (Hadoop, Cascading, Bixo) running in Amazon’s Elastic Compute Cloud (EC2). [slideshare id=2407600&doc=acmuctalk-091102194640-phpapp02] But I don’t see my notes showing up, so here’s the PDF version with full notes, which make the resulting slides a lot more meaningful. [slideshare more…

Presenting at 2009 Silicon Valley Data Mining Camp

October 30, 2009

This coming Sunday is the big Bay Area data mining “unconference“, and with more than 200 people already signed up, it’s going to be a lot of fun. I’ll be presenting at some point during the day – since it’s an unconference, you don’t really know who’s going to be talking about what/when. My topic is “Elastic web mining using open source (Hadoop/Cascading/Bixo) in Amazon’s EC2 cloud“. If you scan more…