Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SnowballStemmer for Russian words list

I do know how to perform SnowballStemmer on a single word (in my case, on russian one). Doing the next things:

from nltk.stem.snowball import SnowballStemmer 

stemmer = SnowballStemmer("russian") 
stemmer.stem("Василий")
'Васил'

How can I do the following if I have a list of words like ['Василий', 'Геннадий', 'Виталий']?

My approach using for loop seems to be not working :(

l=[stemmer.stem(word) for word in l]
like image 411
Keithx Avatar asked Aug 15 '17 15:08

Keithx


People also ask

Which Stemmer is the best?

Snowball stemmer: This algorithm is also known as the Porter2 stemming algorithm. It is almost universally accepted as better than the Porter stemmer, even being acknowledged as such by the individual who created the Porter stemmer.

Which is an example of stemming?

Stemming is a technique used to extract the base form of the words by removing affixes from them. It is just like cutting down the branches of a tree to its stems. For example, the stem of the words eating, eats, eaten is eat. Search engines use stemming for indexing the words.

How do you stem words?

Stem (root) is the part of the word to which you add inflectional (changing/deriving) affixes such as (-ed,-ize, -s,-de,mis). So stemming a word or sentence may result in words that are not actual words. Stems are created by removing the suffixes or prefixes used with a word.

How do you stem words in R?

The tm package in R provides the stemDocument() function to stem the document to it's root. This function either takes in a character vector and returns a character vector, or takes in a PlainTextDocument and returns a PlainTextDocument. example: stemDocument(running,runs,ran) would return (run,run,ran) as the ouput.


1 Answers

Your variable l is not pre-defined, causing the name error. See my last two lines for fix.

>>> from nltk.stem.snowball import SnowballStemmer
>>> stemmer = SnowballStemmer("russian") 
>>> my_words = ['Василий', 'Геннадий', 'Виталий']
>>> l=[stemmer.stem(word) for word in l]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'l' is not defined
>>> l=[stemmer.stem(word) for word in my_words]
>>> l
['васил', 'геннад', 'витал']
like image 113
Nathan Smith Avatar answered Oct 16 '22 15:10

Nathan Smith