Getting the root word using the Wordnet Lemmatizer

Question

I need to find a common root word matched for all related words for a keyword extractor.

How to convert words into the same root using the python nltk lemmatizer?

Eg:
1. generalized, generalization -> general
2. optimal, optimized -> optimize (maybe)
3. configure, configuration, configured -> configure

The python nltk lemmatizer gives 'generalize', for 'generalized' and 'generalizing' when part of speech(pos) tag parameter is used but not for 'generalization'.

Is there a way to do this?

Ani Menon · Accepted Answer

Use SnowballStemmer:

>>> from nltk.stem.snowball import SnowballStemmer
>>> stemmer = SnowballStemmer("english")
>>> print(stemmer.stem("generalized"))
general
>>> print(stemmer.stem("generalization"))
general

Note: Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech.

A general issue I have seen with lemmatizers is that it identifies even bigger words as lemmas.

Example: In WordNet Lemmatizer(checked in NLTK),

Genralized => Generalize
Generalization => Generalization
Generalizations => Generalization

POS tag was not given as input in the above cases, so it was always considered noun.

Getting the root word using the Wordnet Lemmatizer

Tags:

python

nlp

nltk

lemmatization

wordnet

Shanika Ediriweera

1 Answers

Ani Menon

Recent Activity

Donate For Us

Getting the root word using the Wordnet Lemmatizer

Tags:

python

nlp

nltk

lemmatization

wordnet

Shanika Ediriweera

1 Answers

Ani Menon

Related questions

Recent Activity

Donate For Us