Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WordNetLemmatizer: Different handling of wn.ADJ and wn.ADJ_SAT?

I need to lemmatize text using nltk. In order to do this, I apply nltk.pos_tag to each sentence and then convert the resulting Penn Treebank tags (http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) to WordNet tags. I need to do this because WordNetLemmatizer.lemmatize() expects both the word and its correct pos_tag as arguments, otherwise it will just assume everything is a verb.

I just found that there are five different tags defined in WordNet:

  • wn.VERB
  • wn.ADV
  • wn.NOUN
  • wn.ADJ
  • wn.ADJ_SAT

However, every example I found on the internet just ignores wn.ADJ_SAT when converting Treebank tags to WordNet tags. They are all just mapping Penn tags to WordNet tags like this:

  • If Penn tag starts with J: convert to wn.ADJ
  • If Penn tag starts with V: convert to wn.VERB
  • If Penn tag starts with N: convert to wn.NOUN
  • If Penn tag starts with R: convert to wn.ADV

So wn.ADJ_SAT is never used.

My question now is if there are cases where the lemmatizer returns a different result for ADJ_SAT than for ADJ. What are examples for words that are satellite adjectives (ADJ_SAT) and no normal adjectives (ADJ)?

like image 511
Simon Hessner Avatar asked Aug 01 '18 13:08

Simon Hessner


People also ask

Is there a WordNet lemmatizer with POS tag?

Wordnet Lemmatizer (with POS tag) In the above approach, we observed that Wordnet results were not up to the mark. Words like ‘sitting’, ‘flying’ etc remained the same after lemmatization. This is because these words are treated as a noun in the given sentence rather than a verb. To overcome come this, we use POS (Part of Speech) tags.

How many code examples of wordnetlemmatizer are there?

The following are 30 code examples of nltk.stem.WordNetLemmatizer () . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may also want to check out all available functions/classes of the module nltk.stem , or try the search function .

What parts of speech does WordNet lemmatizer know?

The wordnet lemmatizer only knows four parts of speech (ADJ, ADV, NOUN, and VERB) and only the NOUN and VERB rules do anything especially interesting. The noun parts of speech in the treebank tagset all start with NN, the verb tags all start with VB, the adjective tags start with JJ, and the adverb tags start with RB.

What is WordNet?

Wordnet is a publicly available lexical database of over 200 languages that provides semantic relationships between its words. It is one of the earliest and most commonly used lemmatizer technique.


1 Answers

The WordNetLemmatizer in NLTK does not differentiate satellite adjectives from normal adjectives.

nltk.stem.WordNetLemmatizer.lemmatize is uses "WordNet’s built-in morphy function. Returns the input word unchanged if it cannot be found in WordNet."

In WordNet, a satellite adjective--more broadly referred to as a satellite synset--is more of a semantic label used elsewhere in WordNet than a special part-of-speech in nltk.

From the wordnet glossary:

Satellite Synset: Synset in an adjective cluster representing a concept that is similar in meaning to the concept represented by its head synset .

User tripleee points out in this question the following:

adjectives are subcategorized into 'head' and 'satellite' synsets within an adjective clutser

Also, the nltk documentation for nltk.stem.WordNetLemmatizer.lemmatize assumes the default part of speech to be a noun instead of a verb, unless otherwise specified.

like image 113
matt_07734 Avatar answered Sep 21 '22 16:09

matt_07734