I'm using the NLTK WordNet Lemmatizer for a Part-of-Speech tagging project by first modifying each word in the training corpus to its stem (in place modification), and then training only on the new corpus. However, I found that the lemmatizer is not functioning as I expected it to. For example, the word <code>loves</code> is lemmatized to <code>love</code> which is correct, but the word <code>loving</code> remains <code>loving</code> even after lemmatization. Here <code>loving</code> is as in the sentence "I'm loving it". Isn't <code>love</code> the stem of the inflected word <code>loving</code>? Similarly, many other 'ing' forms remain as they are after lemmatization. Is this the correct behavior? What are some other lemmatizers that are accurate? (need not be in NLTK) Are there morphology analyzers or lemmatizers that also take into account a word's Part Of Speech tag, in deciding the word stem? For example, the word <code>killing</code> should have <code>kill</code> as the stem if <code>killing</code> is used as a verb, but it should have <code>killing</code> as the stem if it is used as a noun (as in <code>the killing was done by xyz</code>).

The WordNet lemmatizer does take the POS tag into account, but it doesn't magically determine it: <pre class="prettyprint"><code>>>> nltk.stem.WordNetLemmatizer().lemmatize('loving') 'loving' >>> nltk.stem.WordNetLemmatizer().lemmatize('loving', 'v') u'love' </code></pre> Without a POS tag, it assumes everything you feed it is a noun. So here it thinks you're passing it the noun "loving" (as in "sweet loving").

NLTK WordNet Lemmatizer: Shouldn't it lemmatize all inflections of a word?

Tags:

python

nlp

nltk

I'm using the NLTK WordNet Lemmatizer for a Part-of-Speech tagging project by first modifying each word in the training corpus to its stem (in place modification), and then training only on the new corpus. However, I found that the lemmatizer is not functioning as I expected it to.

For example, the word loves is lemmatized to love which is correct, but the word loving remains loving even after lemmatization. Here loving is as in the sentence "I'm loving it".

Isn't love the stem of the inflected word loving? Similarly, many other 'ing' forms remain as they are after lemmatization. Is this the correct behavior?

What are some other lemmatizers that are accurate? (need not be in NLTK) Are there morphology analyzers or lemmatizers that also take into account a word's Part Of Speech tag, in deciding the word stem? For example, the word killing should have kill as the stem if killing is used as a verb, but it should have killing as the stem if it is used as a noun (as in the killing was done by xyz).

519

asked Aug 27 '14 18:08

sanjeev mk

1 Answers

The WordNet lemmatizer does take the POS tag into account, but it doesn't magically determine it:

>>> nltk.stem.WordNetLemmatizer().lemmatize('loving') 'loving' >>> nltk.stem.WordNetLemmatizer().lemmatize('loving', 'v') u'love'

Without a POS tag, it assumes everything you feed it is a noun. So here it thinks you're passing it the noun "loving" (as in "sweet loving").

131

answered Sep 19 '22 08:09

Fred Foo

Related questions
                            
                                Add leading Zero Python [duplicate]
                            
                                pandas replace multiple values one column
                            
                                Protected method in python [duplicate]
                            
                                PySerial non-blocking read loop
                            
                                Renaming multiple files in a directory using Python
                            
                                Change Django ModelChoiceField to show users' full names rather than usernames
                            
                                How can I open a website with urllib via proxy in Python?
                            
                                IndentationError: unexpected unindent WHY?
                            
                                Exiting Python Debugger ipdb
                            
                                PrettyPrint python into a string, and not stdout
                            
                                How to make a histogram from a list of strings in Python?
                            
                                Parse http GET and POST parameters from BaseHTTPHandler?
                            
                                graphing an equation with matplotlib
                            
                                I can't seem to get --py-files on Spark to work
                            
                                PyCharm doesn't recognise installed module
                            
                                printing bold, colored, etc., text in ipython qtconsole
                            
                                Setting axes.linewidth without changing the rcParams global dict
                            
                                gcc: error trying to exec 'cc1plus': execvp: No such file or directory
                            
                                Python "best formatting practice" for lists, dictionary, etc
                            
                                Slicing a list into a list of sub-lists [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With