I need a good python module for stemming text documents in the pre-processing stage. I found this one http://pypi.python.org/pypi/PyStemmer/1.0.1 but i cannot find the documentation int the link provided. I anyone knows where to find the documentation or any other good stemming algorithm please help.

You may want to try NLTK <pre class="prettyprint"><code>>>> from nltk import PorterStemmer >>> PorterStemmer().stem('complications') </code></pre>

Need a python module for stemming of text documents

2 Answers

You may want to try NLTK

>>> from nltk import PorterStemmer
>>> PorterStemmer().stem('complications')

142

answered Oct 20 '22 01:10

ditkin

All these stemmers that have been discussed here are algorithmic stemmer,hence they can always produce unexpected results such as

In [3]: from nltk.stem.porter import *

In [4]: stemmer = PorterStemmer()

In [5]: stemmer.stem('identified')
Out[5]: u'identifi'

In [6]: stemmer.stem('nonsensical')
Out[6]: u'nonsens'

To correctly get the root words one need a dictionary based stemmer such as Hunspell Stemmer.Here is a python implementation of it in the following link. Example code is here

>>> import hunspell
>>> hobj = hunspell.HunSpell('/usr/share/myspell/en_US.dic', '/usr/share/myspell/en_US.aff')
>>> hobj.spell('spookie')
False
>>> hobj.suggest('spookie')
['spookier', 'spookiness', 'spooky', 'spook', 'spoonbill']
>>> hobj.spell('spooky')
True
>>> hobj.analyze('linked')
[' st:link fl:D']
>>> hobj.stem('linked')
['link']

answered Oct 19 '22 23:10

0xF

Related questions
                            
                                How to identify the subject of a sentence?
                            
                                Plot NetworkX Graph from Adjacency Matrix in CSV file
                            
                                Eclipse, PyDev "Project interpreter not specified”
                            
                                Retrieving Data from MySQL in batches via Python
                            
                                All possible ways to interleave two strings
                            
                                What to download in order to make nltk.tokenize.word_tokenize work?
                            
                                Track download progress of S3 file using boto3 and callbacks
                            
                                TypeError: generatecode() takes 0 positional arguments but 1 was given
                            
                                TypeError: list object is not an iterator [duplicate]
                            
                                Save LGBMRegressor model from python lightgbm package to disc
                            
                                How to run UVICORN in Heroku?
                            
                                CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'
                            
                                How to produce a 303 Http Response in Django?
                            
                                Scatter plot with a huge amount of data
                            
                                How detect length of a numpy array with only one element?
                            
                                ValueError: unichr() arg not in range(0x10000) (narrow Python build)
                            
                                python lxml append element after another element
                            
                                Plotting Ellipsoid with Matplotlib
                            
                                Flask doesn't locate template directory when running with twisted
                            
                                A multiline(paragraph) footer and header in reportlab

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Need a python module for stemming of text documents

Tags:

python

module

preprocessor

nlp

stemming

Kai

People also ask

2 Answers

ditkin

0xF

Recent Activity

Donate For Us