Scale Unlimited has the expertise, resources and services you need for scalable, cost-effective solutions to your data processing, search, and web mining problems.
We provide a full portfolio of services for the architecture, design, development, deployment and operation of data processing, indexing, and web mining systems built with Hadoop, Cascading, Cassandra and Solr. We also have extensive background in Amazon Web Services, particularly EC2, S3 and Elastic MapReduce.
We have deep expertise in using Hadoop to solve scaling and performance problems with data processing. We can also provide help in Hadoop cluster provisioning and operations, performance tuning, mentoring/training, and related technologies such as Sqoop, Hive, Cassandra, HBase and Cascading.
Our extensive background with Cascading – an open source API that dramatically simplifies the process of building scalable, reliable Hadoop workflows – gives our customers a solid foundation for their data processing platforms.
For customers that ultimately need to turn their big data into a searchable index, we offer consulting on Solr, one of the most popular open source solutions for a wide range of search problems. We are also experts in leveraging Solr’s NoSQL functionality to build fast, efficient analytics solutions using a combination of Hadoop and Solr.
We provide scalable, cost-effective solutions for a wide range of web mining tasks, including:
- Content aggregation – fetching, parsing, normalizing and aggregating data records from multiple web sites.
- Competitive pricing – extracting specific pricing data on a per-site basis, or per-product/SKU.
- Link spam analysis – fetching and analyzing content referenced from links, and generating spam scores.
We can customize the Bixo web crawler toolkit for a wide range of crawling strategies:
- Broad crawls – breadth-first crawls of large portions of the web, typically constrained by a large set of target domains.
- Vertical crawls – depth-first crawls of subsets of the web, using content analysis to guide the crawl to areas that are of primary interest.