I need a good python module for stemming text documents in the pre-processing stage.
I found this one
http://pypi.python.org/pypi/PyStemmer/1.0.1
but i cannot find the documentation int the link provided.
I anyone knows where to find the documentation or any other good stemming algorithm please help.
Text-mining with the tm-package - word stemming.
Porter's Stemmer is one of the most used stemming techniques that one can use in Natural Language Processing but as it's been almost 30 years since it's first implementation and development, Martin Porter developed an updated version called Porter2 that is also commonly called Snowball Stemmer due to it's nltk ...
You may want to try NLTK
>>> from nltk import PorterStemmer
>>> PorterStemmer().stem('complications')
All these stemmers that have been discussed here are algorithmic stemmer,hence they can always produce unexpected results such as
In [3]: from nltk.stem.porter import *
In [4]: stemmer = PorterStemmer()
In [5]: stemmer.stem('identified')
Out[5]: u'identifi'
In [6]: stemmer.stem('nonsensical')
Out[6]: u'nonsens'
To correctly get the root words one need a dictionary based stemmer such as Hunspell Stemmer.Here is a python implementation of it in the following link. Example code is here
>>> import hunspell
>>> hobj = hunspell.HunSpell('/usr/share/myspell/en_US.dic', '/usr/share/myspell/en_US.aff')
>>> hobj.spell('spookie')
False
>>> hobj.suggest('spookie')
['spookier', 'spookiness', 'spooky', 'spook', 'spoonbill']
>>> hobj.spell('spooky')
True
>>> hobj.analyze('linked')
[' st:link fl:D']
>>> hobj.stem('linked')
['link']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With