Please see our Training page for details about all of our courses.
In my previous blog post on text feature selection, I'd covered some of the key steps:
Extract the relevant text from the content.
Tokenize this text into discrete words.
Normalize these words (case-folding, stemming)
(and a bit of filtering out “bad words”).
In this blog post I'm going to talk about improving the quality of more...
Ken will be giving a talk on Tuesday, June 3rd at this year's Hadoop Summit in San Jose. His presentation covers use cases for both batch & real-time similarity, and will discuss past projects that use Hadoop and Solr to generate high quality results at scale for several different clients.