ApacheCon Big Data 2016
Earlier this month I flew to Vancouver, a wonderful city I’d never had the chance to visit. My excuse was that I was giving a talk at this year’s ApacheCon Big Data conference, which took place in Vancouver from May 9th to 12th.
Part of the fun of attending a conference like this is the chance to meet people I’d only interacted with via email. For example, Nick Burch is a super-active Tika committer, so I got to say hi while sitting in on his talk about What’s new in Tika 2.0.
My talk was on creating faster ETL workflows using Cascading and the Cascading-Flink planner to target Flink on YARN. I think that’s enough buzzwords for one post. Net-net was a 50% increase in speed for an ETL workflow that was (in some ways) a worst case for Flink – specifically many grouping/joining operations where the data doesn’t fit in memory, so significant spilling to disk. My slides are up on Slideshare.