Cascading & GigaSpaces
We’ve just started a new project, which is to create a “planner” that lets you define & run complex workflows in GigaSpace’s XAP environment, using the Cascading API.
There are lots of interesting challenges, mostly around various impedance mismatches between the Cascading/Hadoop model of data storage and parallel map-reduce execution, versus the in-memory data grid and transactional support provided by GigaSpaces.
Step one has been to create a Cascading Tap that lets a Hadoop-based workflow read from/write to a GigaSpaces “space”, which means one or more partitions in their data grid.
Step two is in progress, and that’s to support running real map-reduce workflows using GigaSpaces XAP.
If we’re successful, we’ll wind up with the ability to run the same workflow in Hadoop (extreme scalability, batch) and GigaSpaces (low latency, incremental) without any changes to the workflow definition.