Scale Unlimited is based in Nevada City, California and provides consulting and training services for big data analytics, search, and web mining.

The company was founded in 2008 by Stefan Groschupf, Chris Wensel, and Ken Krugler, three of the world’s leading experts in scalable, reliable data analytics, workflow design and web mining.

All are well-known community members and contributors to key open source projects, including Hadoop, Bixo, Cascading, Solr, Lucene, Katta and Tika.

Solutions from Scale Unlimited are built using these and other widely used and well supported open source packages, providing maximum flexibility with no commercial lock-in.


Scale Unlimited solves three major problems that the founders experienced first-hand at previous startups and consulting projects.

First, processing big data requires a workflow system that is efficient, reliable and scalable. Even a small dataset is often gigabytes of data. A workflow based on ad hoc scripts isn’t reliable. Single server solutions don’t scale.

With Scale Unlimited, solutions are built using Hadoop and Cascading-based workflows.

Second, having an internal team ready and able to assume responsibility is critical for the long-term success of the project. A consulting project without a hand-off plan is destined to fail.

We provide introductory and advanced training on the open source technologies we use, including Hadoop, Cascading and Solr. And we know how to mentor your team, ensuring a smooth and successful hand-off at the end of the project.

Third, companies that want to leverage web data have expertise in the data, and how it gets analyzed/monetized. Companies do not have expertise in the arcane art of web crawling, which means they wind up spending a lot of time and money dealing with getting the data, not using the data.

With Scale Unlimited, the focus is on the specific customization required to specify what to crawl, and how to process the results. You only pay for what you’re using. You only develop what you need, using unique knowledge you have about the problem space. You don’t configure servers, manage clusters, monitor crawls, block honeypots, calm down angry webmasters, or pay for idle hardware.


kkrugler-headshot Ken Krugler – Veteran developer and entrepreneur, 25+ years experience. Founder and President of TransPac Software, a 20 year leader in internationalization, mobile devices, and search consulting. Founder and CTO of Krugle, a vertical search engine and enterprise appliance for code and technical information (funded by Emergence Capital). Co-founder of Bixo web mining project. Author and speaker on vertical search and web mining.

Chris Schneider – VP of TransPac Software. Technical lead on Krugle vertical web crawler. BS in Computer Science from MIT, MS in Education from UC Berkeley.

Vivek Magotra – Technical lead on Krugle web crawler page classifier. Part of team that developed large scale Hadoop-based data processing system. BS & MS in Computer Science from University of Pune.

Technical Advisors

Chris Wensel – Founder of Concurrent and author of Cascading data processing project. Co-founder of Scale Unlimited, the first Hadoop training company. A former Chief Architect at Thomson Reuters.

sgroschupf-headshot Stefan Groschupf – Co-founder and CEO of Datameer, the leader in big data spreadsheet analytics using Hadoop. Founder and President of 101tec, a multinational search/Hadoop consulting company. Co-founder of Scale Unlimited, a cloud computing training company. Creator of Katta search project. Co-founder of Bixo web mining project.

pvoss-headshot Peter Voß – Technical lead on Datameer’s Hadoop-based data processing system. Former lead on ultra high volume data processing system for Deutsche Post.