Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is stemming useful?

Simple question: When do we stem or lemmatize the words? Is stemming helpful for all nlp processes or are there applications where using full form of words might result in better accuracy or precision?

like image 750
VJune Avatar asked Jan 24 '13 20:01

VJune


1 Answers

In the context of machine learning based NLP, stemming makes your training data more dense. It reduces the size of the dictionary (number of words used in the corpus) two or three-fold (of even more for languages with many flections like French, where a single stem can generate dozens of words in case of verbs for instance).

Having the same corpus, but less input dimensions, ML will work better. Recall should really be better.

The downside is, if in some cases the actual word (as opposed to its stem) makes a difference, then your system won't be able to leverage it. So you might lose some precision.

like image 124
Blacksad Avatar answered Oct 19 '22 10:10

Blacksad