My goal is to analyze some corpus (twitter for the now) for emotional content. Just today I realized it would make a bit of sense to search for word stems as opposed to having an exhaustive list of emotional word stems. And so I've been exploring nltk.stem only to realize that there are 4 different stemmers. I'd like to ask the stackoverflow linguists whether LancasterStemmer, PorterStemmer, RegexpStemmer, RSLPStemmer, or WordNetStemmer is best preferably with some justification.
The 'english' stemmer is better than the original 'porter' stemmer. Extra stemmer tests can be found in nltk. test. unit.
Snowball stemmer: This algorithm is also known as the Porter2 stemming algorithm. It is almost universally accepted as better than the Porter stemmer, even being acknowledged as such by the individual who created the Porter stemmer.
Stemming is a faster process than lemmatization as stemming chops off the word irrespective of the context, whereas the latter is context-dependent. Stemming is a rule-based approach, whereas lemmatization is a canonical dictionary-based approach. Lemmatization has higher accuracy than stemming.
nltk. stem is a package that performs stemming using different classes.
It may be a bit different than you are asking, but the Nodebox Lingustics library contains an is_emotive() function which seems to check words to see if they are recursive hyponyms of certain emotional words. From commonsense.py
ekman = ["anger", "disgust", "fear", "joy", "sadness", "surprise"]
other = ["emotion", "feeling", "expression"]
Not a stemmer, but an interesting approach to check out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With