Text feature selection for machine learning – part 2
In my previous blog post on text feature selection, I’d covered some of the key steps: Extract the relevant text from the content. Tokenize this text into discrete words. Normalize these words (case-folding, stemming) (and a bit of filtering out “bad words”). In this blog post I’m going to talk about improving the quality of the terms. But first I wanted to respond to some questions from part 1, about more…