Services
Scale Unlimited has the expertise, resources and services you need for scalable, cost-effective solutions to your data processing, search, and web mining problems.
We provide a full portfolio of services for the architecture, design, development, deployment and operation of data processing, indexing, and web mining systems built with Hadoop, Solr and Cascading.
Hadoop Consulting
We have deep expertise in using Hadoop to solve scaling and performance problems with data processing. We can also provide help in Hadoop cluster provisioning and operations, performance tuning, mentoring/training, and related technologies such as Sqoop, Hive, HBase and Cascading.
Web Mining
We provide scalable, cost-effective solutions for a wide range of web mining tasks, including:
- Content aggregation – fetching, parsing, normalizing and aggregating data records from multiple web sites.
- Competitive pricing – extracting specific pricing data on a per-site basis, or per-product/SKU.
- Link spam analysis – fetching and analyzing content referenced from links, and generating spam scores.
Web Crawling
We can customize the Bixo web crawler toolkit for a wide range of crawling strategies:
- Broad crawls – breadth-first crawls of large portions of the web, typically constrained by a large set of target domains.
- Vertical crawls – depth-first crawls of subsets of the web, using content analysis to guide the crawl to areas that are of primary interest.
Cascading Consulting
For many customers, the results from web mining are part of a larger data processing workflow. Our extensive background with Cascading – an open source API that dramatically simplifies the process of building scalable, reliable Hadoop workflows – gives our customers confidence that they will get maximum value out of the data extracted from the web.
Search Consulting
For customers that ultimately need to turn their web mining data into a searchable index, we offer consulting on Lucene, Solr, and Katta – the three most popular open source solutions for a wide range of search problems.