I am new to python and practising with examples from book. Can anyone explain why when I am trying to stem some example with this code nothing is changed? <pre class="prettyprint"><code>>>> from nltk.stem import PorterStemmer >>> stemmer=PorterStemmer() >>> stemmer.stem('numpang wifi stop gadget shopping') 'numpang wifi stop gadget shopping' </code></pre> But when I do this it works <pre class="prettyprint"><code>>>> stemmer.stem('shopping') 'shop' </code></pre>

try this: <pre class="prettyprint"><code>res = ",".join([ stemmer.stem(kw) for kw in 'numpang wifi stop gadget shopping'.split(" ")]) </code></pre> the problem is that, probably, that stemmer works on single words. your string has no "root" word, while the single word "shopping" has the root "shop". so you'll have to compute the stemming separately edit: from their source code -> <pre class="prettyprint"><code>Stemming algorithms attempt to automatically remove suffixes (and in some cases prefixes) in order to find the "root word" or stem of a given word. This is useful in various natural language processing scenarios, such as search. </code></pre> so i guess you are indeed forced to split your string by yourself

PorterStemmer doesn't seem to work

Tags:

python

nltk

porter-stemmer

I am new to python and practising with examples from book.
Can anyone explain why when I am trying to stem some example with this code nothing is changed?

>>> from nltk.stem import PorterStemmer
>>> stemmer=PorterStemmer()
>>> stemmer.stem('numpang wifi stop gadget shopping')
'numpang wifi stop gadget shopping'

But when I do this it works

>>> stemmer.stem('shopping')
'shop'

557

asked Oct 19 '12 12:10

Aikin

1 Answers

try this:

res = ",".join([ stemmer.stem(kw) for kw in 'numpang wifi stop gadget shopping'.split(" ")])

the problem is that, probably, that stemmer works on single words. your string has no "root" word, while the single word "shopping" has the root "shop". so you'll have to compute the stemming separately

edit:

from their source code ->

Stemming algorithms attempt to automatically remove suffixes (and in some
cases prefixes) in order to find the "root word" or stem of a given word. This
is useful in various natural language processing scenarios, such as search.

so i guess you are indeed forced to split your string by yourself

118

answered Oct 07 '22 03:10

Samuele Mattiuzzo

Related questions
                            
                                Get required fields from Document in mongoengine?
                            
                                Building OpenCV libraries from source files
                            
                                Pointfree function combination in Python
                            
                                Python: __str__, but for a class, not an instance?
                            
                                Why are some mysql connections selecting old data the mysql database after a delete + insert?
                            
                                how to using python to diff two html files
                            
                                Running a linux command from python
                            
                                Django Custom Save Model
                            
                                Python max with same number of instances
                            
                                Recursive Generators in Python
                            
                                Update DynamoDB Atomic Counter with Python / Boto
                            
                                Removing html image tags and everything in between from a string
                            
                                How to prevent iterator getting exhausted?
                            
                                numpy array of chars to string
                            
                                Finding matching submatrices inside a matrix
                            
                                if else branching in jinja2
                            
                                Python Alternatives to Global Variables
                            
                                Loop print through two lists to get two columns with fixed(custom set) space between the first letter of each element of each list
                            
                                Installing PIP packages to a Virtualenv using a download cache
                            
                                Django --CSRF token missing or incorrect

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With