Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PorterStemmer doesn't seem to work

I am new to python and practising with examples from book.
Can anyone explain why when I am trying to stem some example with this code nothing is changed?

>>> from nltk.stem import PorterStemmer
>>> stemmer=PorterStemmer()
>>> stemmer.stem('numpang wifi stop gadget shopping')
'numpang wifi stop gadget shopping'

But when I do this it works

>>> stemmer.stem('shopping')
'shop'
like image 557
Aikin Avatar asked Oct 19 '12 12:10

Aikin


People also ask

Which is the best Stemmer?

Snowball stemmer: This algorithm is also known as the Porter2 stemming algorithm. It is almost universally accepted as better than the Porter stemmer, even being acknowledged as such by the individual who created the Porter stemmer.

Which algorithm is used by Porterstemmer package?

The Porter stemming algorithm (or 'Porter stemmer') is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems.

What is Lancaster stemming?

Lancaster Stemming Algorithm Like the Porter stemmer, the Lancaster stemmer consists of a set of rules where each rule specifies either deletion or replacement of an ending. Also, some rules are restricted to intact words, and some rules are applied iteratively as the word goes through them.


1 Answers

try this:

res = ",".join([ stemmer.stem(kw) for kw in 'numpang wifi stop gadget shopping'.split(" ")])

the problem is that, probably, that stemmer works on single words. your string has no "root" word, while the single word "shopping" has the root "shop". so you'll have to compute the stemming separately

edit:

from their source code ->

Stemming algorithms attempt to automatically remove suffixes (and in some
cases prefixes) in order to find the "root word" or stem of a given word. This
is useful in various natural language processing scenarios, such as search.

so i guess you are indeed forced to split your string by yourself

like image 118
Samuele Mattiuzzo Avatar answered Oct 07 '22 03:10

Samuele Mattiuzzo