Fill out the form below if you’d like to learn how Scale Unlimited can solve your big data processing/training and web crawling problems.
Note that fields marked with an ‘*‘ are required.
In my previous blog post on text feature selection, I'd covered some of the key steps:
Extract the relevant text from the content.
Tokenize this text into discrete words.
Normalize these words (case-folding, stemming)
(and a bit of filtering out “bad words”).
In this blog post I'm going to talk about improving the quality of more...
Ken will be giving a talk on Thursday, September 11th at this year's Cassandra Summit in San Francisco. His presentation describes how Early Warning (one of Scale Unlimited's clients) uses Cassandra and Solr to handle fuzzy entity matching across hundreds of millions of people and companies.