How do I calculate the shortest path (geodesic) distance between two adjectives in WordNet using Python NLTK?

Tags:

Computing the semantic similarity between two synsets in WordNet can be easily done with several built-in similarity measures, such as:

synset1.path_similarity(synset2)

synset1.lch_similarity(synset2), Leacock-Chodorow Similarity

synset1.wup_similarity(synset2), Wu-Palmer Similarity

(as seen here)

However, all of these exploit WordNet's taxonomic relations, which are relations for nouns and verbs. Adjectives and adverbs are related via synonymy, antonymy and pertainyms. How can one measure the distance (number of hops) between two adjectives?

I tried path_similarity(), but as expected, it returns 'None':

from nltk.corpus import wordnet as wn
x = wn.synset('good.a.01')
y = wn.synset('bad.a.01')


print(wn.path_similarity(x,y))

If there is any way to compute the distance between one adjective and another, pointing it out would be greatly appreciated.

298

asked Jul 05 '15 19:07

modarwish

1 Answers

There's no easy way to get similarity between words that are not nouns/verbs.

As noted, nouns/verbs similarity are easily extracted from

>>> from nltk.corpus import wordnet as wn
>>> dog = wn.synset('dog.n.1')
>>> cat = wn.synset('cat.n.1')
>>> car = wn.synset('car.n.1')
>>> wn.path_similarity(dog, cat)
0.2
>>> wn.path_similarity(dog, car)
0.07692307692307693
>>> wn.wup_similarity(dog, cat)
0.8571428571428571
>>> wn.wup_similarity(dog, car)
0.4
>>> wn.lch_similarity(dog, car)
1.072636802264849
>>> wn.lch_similarity(dog, cat)
2.0281482472922856

For adjective it's hard, so you would need to build your own text similarity device. The easiest way is to use vector space model, basically, all words are represented by a number of floating point numbers, e.g.

>>> import numpy as np
>>> blue = np.array([0.2, 0.2, 0.3])
>>> red = np.array([0.1, 0.2, 0.3])
>>> pink = np.array([0.1001, 0.221, 0.321])
>>> car = np.array([0.6, 0.9, 0.5])
>>> def cosine(x,y):
...     return np.dot(x,y) / (np.linalg.norm(x) * np.linalg.norm(y))
... 
>>> cosine(pink, red)
0.99971271929384864
>>> cosine(pink, blue)
0.96756147991512709
>>> cosine(blue, red)
0.97230558532824662
>>> cosine(blue, car)
0.91589118863996888
>>> cosine(red, car)
0.87469454283170045
>>> cosine(pink, car)
0.87482313596223782

To train a bunch of vectors for something like pink = np.array([0.1001, 0.221, 0.321]), you should try google for

Latent semantic indexing / Latent semantic analysis
Bag of Words
Vector space model semantics
Word2Vec, Doc2Vec, Wiki2Vec
Neural Nets
cosine similarity natural language semantics

You can also try some off the shelf software / libraries like:

Gensim https://radimrehurek.com/gensim/
http://webcache.googleusercontent.com/search?q=cache:u5y4He592qgJ:takelab.fer.hr/sts/+&cd=2&hl=en&ct=clnk&gl=sg

Other than vector space model, you can try some graphical model that puts words into a graph and uses something like pagerank to walk around the graph to give you some similarity measure.

alvas

Related questions
                            
                                how to pass params to python tornado IOLoop run_sync(main) function
                            
                                Python 2.7.9 Mac OS 10.10.3 Message "setCanCycle: is deprecated. Please use setCollectionBehavior instead"
                            
                                Multiple login fields in django user
                            
                                Why is fileinput.input object not lost when going out-of-scope?
                            
                                Selenium - send keys - what element should I use
                            
                                Python Selenium return text, unicode object is not callable
                            
                                Python inequality operators; comparing lists [duplicate]
                            
                                Python conditional exception messages
                            
                                Chain dynamic iterable of context managers to a single with statement
                            
                                Capturing standard out from a Paramiko command
                            
                                Getting Cursor position in Tkinter entry widget
                            
                                How to correct text and return the corrected text automatically with PyEnchant
                            
                                Why django group by wrong field? annotate()
                            
                                Will changes made to a Python script affect another run in progress on the same file?
                            
                                get content from parent block in jinja2
                            
                                How can we print the variable name along with its value in python, which will be useful during debugging?
                            
                                Getting Random Forest feature_importances_ from OneVsRestClassifier for Multi-label classification
                            
                                Unable to load configuration file from instance folder when deploying app
                            
                                import module within loop
                            
                                incorrect answers for quadratic equations

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I calculate the shortest path (geodesic) distance between two adjectives in WordNet using Python NLTK?

Tags:

python

nlp

nltk

cosine-similarity

wordnet

modarwish

People also ask

1 Answers

alvas

Recent Activity

Donate For Us