Fill out the form below if you’d like to learn how Scale Unlimited can solve your big data processing/training and web crawling problems.
Note that fields marked with an ‘*‘ are required.
Earlier this month I flew to Vancouver, a wonderful city I'd never had the chance to visit. My excuse was that I was giving a talk at this year's ApacheCon Big Data conference, which took place in Vancouver from May 9th to 12th.
Part of the fun of attending a conference more...
At Scale Unlimited we participate in a number of open source projects. Many of these have been recently updated...
cascading.utils (2.6.0) - Updated to Hadoop 2.4 & Cascading 2.6. Fixed job naming issue. More flexible tuple logging.
bixo (0.9.2) - Updated to Hadoop 2.4 & Cascading 2.6. Fixed bug with extracted outlink data.
crawler-commons (0.6) - Many sitemap & robots.txt processing fixes and improvements.
Tika (1.9) - Fixes for external parsers, new formats, improved server functionality, and much more.