WordNet lemmatizer in NLTK: what is the correct lemma for "boss"?

Question

I use nltk 3.0.4 and notice that lemmas for words boss and bosses are different.

from nltk.stem.wordnet import WordNetLemmatizer

wnl = WordNetLemmatizer()

print wnl.lemmatize("boss", "n")
# returns "bos"

print wnl.lemmatize("bosses", "n")
# returns "boss"

From my point of view it's a weird behavior especially that boss is a known word in WordNet and there is a rule to keep ss.

Does anyone have an explanation or this is just a bug? How I should deal with it?

b3000 · Accepted Answer

After checking the code (_morphy()) that generates the possible analyses for a given word, I found that there is no rule included to keep ss.
Bos is also a base form in wordnet.

Substitution rules:

MORPHOLOGICAL_SUBSTITUTIONS = {
    NOUN: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'),
           ('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'),
           ('men', 'man'), ('ies', 'y')],
    VERB: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''),
           ('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')],
    ADJ: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
    ADV: []}

Calling print wnl.lemmatize("boss", "n"):

Since a suitable base form (Bos) can be found when applying the substitution rules, it is returned. If this had not been included in wordnet the the lemma for boss would be boss since no shorter form can be found.

WordNet lemmatizer in NLTK: what is the correct lemma for "boss"?

Tags:

python

nltk

lemmatization

wordnet

gakhov

1 Answers

b3000

Recent Activity

Donate For Us

WordNet lemmatizer in NLTK: what is the correct lemma for "boss"?

Tags:

python

nltk

lemmatization

wordnet

gakhov

1 Answers

b3000

Related questions

Recent Activity

Donate For Us