Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLTK WordNet Lemmatizer: Shouldn't it lemmatize all inflections of a word?

Tags:

python

nlp

nltk

I'm using the NLTK WordNet Lemmatizer for a Part-of-Speech tagging project by first modifying each word in the training corpus to its stem (in place modification), and then training only on the new corpus. However, I found that the lemmatizer is not functioning as I expected it to.

For example, the word loves is lemmatized to love which is correct, but the word loving remains loving even after lemmatization. Here loving is as in the sentence "I'm loving it".

Isn't love the stem of the inflected word loving? Similarly, many other 'ing' forms remain as they are after lemmatization. Is this the correct behavior?

What are some other lemmatizers that are accurate? (need not be in NLTK) Are there morphology analyzers or lemmatizers that also take into account a word's Part Of Speech tag, in deciding the word stem? For example, the word killing should have kill as the stem if killing is used as a verb, but it should have killing as the stem if it is used as a noun (as in the killing was done by xyz).

like image 519
sanjeev mk Avatar asked Aug 27 '14 18:08

sanjeev mk


People also ask

How do you use WordNet Lemmatizer?

In order to lemmatize, you need to create an instance of the WordNetLemmatizer() and call the lemmatize() function on a single word. Let's lemmatize a simple sentence. We first tokenize the sentence into words using nltk. word_tokenize and then we will call lemmatizer.

What is the best Lemmatizer?

Wordnet is a publicly available lexical database of over 200 languages that provides semantic relationships between its words. It is one of the earliest and most commonly used lemmatizer technique.

What is WordNet Python?

The WordNet is a part of Python's Natural Language Toolkit. It is a large word database of English Nouns, Adjectives, Adverbs and Verbs. These are grouped into some set of cognitive synonyms, which are called synsets. To use the Wordnet, at first we have to install the NLTK module, then download the WordNet package.


1 Answers

The WordNet lemmatizer does take the POS tag into account, but it doesn't magically determine it:

>>> nltk.stem.WordNetLemmatizer().lemmatize('loving') 'loving' >>> nltk.stem.WordNetLemmatizer().lemmatize('loving', 'v') u'love' 

Without a POS tag, it assumes everything you feed it is a noun. So here it thinks you're passing it the noun "loving" (as in "sweet loving").

like image 131
Fred Foo Avatar answered Sep 19 '22 08:09

Fred Foo