Fill out the form below if you’d like to learn how Scale Unlimited can solve your big data processing/training and web crawling problems.
Note that fields marked with an ‘*‘ are required.
In my previous blog post on text feature selection, I'd covered some of the key steps:
Extract the relevant text from the content.
Tokenize this text into discrete words.
Normalize these words (case-folding, stemming)
(and a bit of filtering out “bad words”).
In this blog post I'm going to talk about improving the quality of more...
Ken will be giving a talk on Tuesday, June 3rd at this year's Hadoop Summit in San Jose. His presentation covers use cases for both batch & real-time similarity, and will discuss past projects that use Hadoop and Solr to generate high quality results at scale for several different clients.