Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need a python module for stemming of text documents

I need a good python module for stemming text documents in the pre-processing stage.

I found this one

http://pypi.python.org/pypi/PyStemmer/1.0.1

but i cannot find the documentation int the link provided.

I anyone knows where to find the documentation or any other good stemming algorithm please help.

like image 786
Kai Avatar asked Apr 29 '12 03:04

Kai


People also ask

Which of the following packages is used for stemming in text mining?

Text-mining with the tm-package - word stemming.

Which NLTK package can be used for stemming?

Porter's Stemmer is one of the most used stemming techniques that one can use in Natural Language Processing but as it's been almost 30 years since it's first implementation and development, Martin Porter developed an updated version called Porter2 that is also commonly called Snowball Stemmer due to it's nltk ...


2 Answers

You may want to try NLTK

>>> from nltk import PorterStemmer
>>> PorterStemmer().stem('complications')
like image 142
ditkin Avatar answered Oct 20 '22 01:10

ditkin


All these stemmers that have been discussed here are algorithmic stemmer,hence they can always produce unexpected results such as

In [3]: from nltk.stem.porter import *

In [4]: stemmer = PorterStemmer()

In [5]: stemmer.stem('identified')
Out[5]: u'identifi'

In [6]: stemmer.stem('nonsensical')
Out[6]: u'nonsens'

To correctly get the root words one need a dictionary based stemmer such as Hunspell Stemmer.Here is a python implementation of it in the following link. Example code is here

>>> import hunspell
>>> hobj = hunspell.HunSpell('/usr/share/myspell/en_US.dic', '/usr/share/myspell/en_US.aff')
>>> hobj.spell('spookie')
False
>>> hobj.suggest('spookie')
['spookier', 'spookiness', 'spooky', 'spook', 'spoonbill']
>>> hobj.spell('spooky')
True
>>> hobj.analyze('linked')
[' st:link fl:D']
>>> hobj.stem('linked')
['link']
like image 31
0xF Avatar answered Oct 19 '22 23:10

0xF