Scale Unlimited and Aster Data Systems presented
Scale Camp on June 9th in Santa Clara, CA.
Speaker presentations are now available to view. Please click on the link below each speaker’s name. (All presentations are not uploaded, and will be made available when received).
What was it?
ScaleCamp was an informal event that brought together the developer community in the Hadoop eco-system with a view to sharing first hand experience reports.
ScaleCamp was intended for developers and thought leaders to enhance their knowledge-base with real-life use cases of technologies in the Hadoop eco-system. it was run similar to a BarCamp with open opportunities for people to present their experiences.
There were three parallel tracks with twelve slots.
Presentations:
Ted Dunning - DeepDyve
Katta was developed as a shard-distributed form of Lucene in which the use of Zookeeper made shard management simple. At deepdyve, we have re-imagined katta as a completely generic shard manager for anything that needs reliable, scalable, distributed and aggregated operations against shards.
Stefan Groschupf - Scale Unlimited
Extracting information from the web using Hadoop, Cascading and Bixo
Extracting information from the Web becomes a challenge when it need to scale. At EMI Music a stack of hadoop, cascading and the crawler toolkit Bixo is used to fetch and process a large set of websites that feeds into BI system.
David Fallside - IBM
Building Hadoop applications for a media and a financial services company
We have created a number of proof of concept projects with media and financial services companies that use Hadoop, Nutch and related technologies. A notable feature of these projects is integration, both in the sense of data obtained from different sources, and in the sense of integration of Hadoop and visualization technologies. I intend to talk about two projects, a media company and a financial services company, the applications we built around Hadoop for each of the projects, and show some of their output visualizations. I’ll include some speeds and feeds as well.
Paul Baclace
Visualizing Map-Reduce
A demo of interactive time-space diagrams that illustrate the performance of Hadoop Map-Reduce jobs.
Peter Pawlowski - Aster Data Systems
MapReduce Inside a Database System - When and How
Peter will discuss Aster Data’s in-database MapReduce technology and present use cases where it complements other technologies like Hadoop.
Dr. DJ Patil - LinkedIn
Large Scale Analytics at LinkedIn
Dr Patil will discuss their current analytics framework and the methodology and technologies they are planning to put in place for future growth.
Alex Dorman - Contextweb
Using Hadoop for frequent data aggregation
Alex will share ContextWeb experience of using Hadoop for frequent aggregation of data, ad performance optimizations, report generation and analytics.
Paco Nathan - ShareThis
Mashing technologies in the Cloud for Big Data Analysis
Paco will discuss how ShareThis mashes technologies in the Cloud for Big Data analysis, leveraging the AsternCluster Cloud Edition, Amazon Elastic Mapreduce, Cascading and other AWS in their analytic system architecture.
Matt Ingenthron - NorthScale
Memory Caching to Scale: is it Different on a Public Cloud?
Many developers have grown to rely upon a distributed memory cache integrated in applications architectures to deliver interactive, responsive sites. When bringing these applications to cloud compute environments, there can be challenges in throughput/latency and getting the desired elasticity out of the environment. Matt will review NorthScale’s experience with distributed memory caches in public clouds and approaches to preserving the needed performance and scalability from the application’s point of view.
Kevin Beyer - IBM
Advanced data flow analytics in Jaql - zipped file
Kevin will present “Advanced data flow analytics in Jaql”
Jean-Daniel Cryans - Openplaces.org; Ecole De Technologie Superieure, Montreal
A project to build the world’s largest organized repository of maps
Jean-Daniel will talk about Hadoop at openplaces.org, a project to build the world’s largest organized repository of maps, pictures, details, and advice about every place in the world. It runs right off a EC2-hosted HBase cluster and internally a 40 nodes cluster is used to batch process crawl data, indexing, named entity recognition, and other data mining tasks, with more than 50 MapReduce jobs. The presentation is about the experience of using Hadoop and HBase in such an environment and advices for those who would consider going the same path.
Rusty Burchfield and Doug Judd - Zvents
Rusty Burchfield and Doug Judd will co-present. Rusty will start by describing how the Zvents team uses scalable computing technology for analytics and reporting. They’ll give a brief demo of some of the applications built using Hadoop, Hypertable and Cascading. Doug will finish with a brief overview of Hypertable and it’s current status.
|