Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I calculate the shortest path (geodesic) distance between two adjectives in WordNet using Python NLTK?

Computing the semantic similarity between two synsets in WordNet can be easily done with several built-in similarity measures, such as:

synset1.path_similarity(synset2)

synset1.lch_similarity(synset2), Leacock-Chodorow Similarity

synset1.wup_similarity(synset2), Wu-Palmer Similarity

(as seen here)

However, all of these exploit WordNet's taxonomic relations, which are relations for nouns and verbs. Adjectives and adverbs are related via synonymy, antonymy and pertainyms. How can one measure the distance (number of hops) between two adjectives?

I tried path_similarity(), but as expected, it returns 'None':

from nltk.corpus import wordnet as wn
x = wn.synset('good.a.01')
y = wn.synset('bad.a.01')


print(wn.path_similarity(x,y))

If there is any way to compute the distance between one adjective and another, pointing it out would be greatly appreciated.

like image 298
modarwish Avatar asked Jul 05 '15 19:07

modarwish


People also ask

How do you use WordNet in Python?

To use the Wordnet, at first we have to install the NLTK module, then download the WordNet package. In the wordnet, there are some groups of words, whose meaning are same. In the first example, we will see how wordnet returns meaning and other details of a word.

What is Synset in WordNet?

WordNet categorizes English words into synonyms, referred to as Synsets (short for a set of synonyms). Every Synset contains a name, a part-of-speech (nouns, verbs, adverbs, and adjectives), and a number. Synsets are used to store synonyms, where each word in the Synset shares the same meaning.

What is Lemma in WordNet?

Lemmas in Wordnet In linguistics, the canonical form or morphological form of a word is called a lemma. To find a synonym as well as antonym of a word, we can also lookup lemmas in WordNet.


1 Answers

There's no easy way to get similarity between words that are not nouns/verbs.

As noted, nouns/verbs similarity are easily extracted from

>>> from nltk.corpus import wordnet as wn
>>> dog = wn.synset('dog.n.1')
>>> cat = wn.synset('cat.n.1')
>>> car = wn.synset('car.n.1')
>>> wn.path_similarity(dog, cat)
0.2
>>> wn.path_similarity(dog, car)
0.07692307692307693
>>> wn.wup_similarity(dog, cat)
0.8571428571428571
>>> wn.wup_similarity(dog, car)
0.4
>>> wn.lch_similarity(dog, car)
1.072636802264849
>>> wn.lch_similarity(dog, cat)
2.0281482472922856

For adjective it's hard, so you would need to build your own text similarity device. The easiest way is to use vector space model, basically, all words are represented by a number of floating point numbers, e.g.

>>> import numpy as np
>>> blue = np.array([0.2, 0.2, 0.3])
>>> red = np.array([0.1, 0.2, 0.3])
>>> pink = np.array([0.1001, 0.221, 0.321])
>>> car = np.array([0.6, 0.9, 0.5])
>>> def cosine(x,y):
...     return np.dot(x,y) / (np.linalg.norm(x) * np.linalg.norm(y))
... 
>>> cosine(pink, red)
0.99971271929384864
>>> cosine(pink, blue)
0.96756147991512709
>>> cosine(blue, red)
0.97230558532824662
>>> cosine(blue, car)
0.91589118863996888
>>> cosine(red, car)
0.87469454283170045
>>> cosine(pink, car)
0.87482313596223782

To train a bunch of vectors for something like pink = np.array([0.1001, 0.221, 0.321]), you should try google for

  • Latent semantic indexing / Latent semantic analysis
  • Bag of Words
  • Vector space model semantics
  • Word2Vec, Doc2Vec, Wiki2Vec
  • Neural Nets
  • cosine similarity natural language semantics

You can also try some off the shelf software / libraries like:

  • Gensim https://radimrehurek.com/gensim/
  • http://webcache.googleusercontent.com/search?q=cache:u5y4He592qgJ:takelab.fer.hr/sts/+&cd=2&hl=en&ct=clnk&gl=sg

Other than vector space model, you can try some graphical model that puts words into a graph and uses something like pagerank to walk around the graph to give you some similarity measure.

See also:

  • Compare similarity of terms/expressions using NLTK?
  • check if two words are related to each other
  • How to determine semantic hierarchies / relations in using NLTK?
  • Is there an algorithm that tells the semantic similarity of two phrases
  • Semantic Relatedness Algorithms - python
like image 136
alvas Avatar answered Nov 15 '22 05:11

alvas