Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Wordnet to generate superlative, comparative and adjectives

I have a wordnet database setup, and I'm trying to generate synonyms for various words.

For example, the word, "greatest". I'll look through and find several different synonyms, but none of them really fit the definition - for example, one is "superlative".

I'm guessing that I need to do some sort of check by frequency in a given language or stemming a word to get the base word (for example, greatest -> great, great -> best).

What table should I be using to ensure my words make some modicum of sense?

like image 985
Steven Matthews Avatar asked Nov 30 '14 19:11

Steven Matthews


1 Answers

Neither stemmer or lemmatizer can get you from greatest -> great:

>>> from nltk.stem import WordNetLemmatizer
>>> from nltk.stem import WordNetLemmatizer, PorterStemmer
>>> porter = PorterStemmer()
>>> wnl = WordNetLemmatizer()
>>> greatest = 'greatest'
>>> porter.stem(greatest)
u'greatest'
>>> wnl.lemmatize(greatest)
'greatest'
>>> greater = 'greater'
>>> wnl.lemmatize(greater)
'greater'
>>> porter.stem(greater)
u'greater'

But seems like you can make use of some nice properties of the PennTreeBank tagset to get from greatest -> great:

>>> from nltk import pos_tag
>>> pos_tag(['greatest'])
[('greatest', 'JJS')]
>>> pos_tag(['greater'])
[('greater', 'JJR')]
>>> pos_tag(['great'])
[('great', 'JJ')]

Let's try a crazy rule based system, let's start from greatest:

>>> import re
>>> word1 = 'greatest'
>>> re.sub('est$', '', word1) 
'great'
>>> re.sub('est$', 'er', word1) 
'greater'
>>> pos_tag([re.sub('est$', '', word1)])[0][1]
'JJ'
>>> pos_tag([re.sub('est$', 'er', word1)])[0][1]
'JJR'
>>> word1
'greatest'

Now that we know that we can build our own little superlative stemmer/lemmatizer/tail_substituter, let's write a rule that says if a word gives a superlative POS tag and our tail_substituter gives us JJ when we stem and JJR when we convert, we can safely say that the comparative and base form of the word can be easily gotten with our tail_substituter:

>>> if pos_tag([word1])[0][1] == 'JJS' \
... and pos_tag([re.sub('est$', '', word1)])[0][1] == 'JJ' \
... and pos_tag([re.sub('est$', 'er', word1)])[0][1] == 'JJR':
...     comparative = re.sub('est$', 'er', word1)
...     adjective = re.sub('est$', '', word1)
... 
>>> adjective
'great'
>>> comparative
'greater'

Now that gets you from greatest -> greater -> great. From great -> best is sort of weird, since lexically they're not not related although their semantics relative seems related.

So i think it would be subjective to say that great -> best is a valid transformation

like image 116
alvas Avatar answered Oct 18 '22 18:10

alvas