Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Normalize similarity measures from Wordnet

I am trying to calculate semantic similarity between two words. I am using Wordnet-based similarity measures i.e Resnik measure(RES), Lin measure(LIN), Jiang and Conrath measure(JNC) and Banerjee and Pederson measure(BNP).

To do that, I am using nltk and Wordnet 3.0. Next, I want to combine the similarity values obtained from different measure. To do that i need to normalize the similarity values as some measure give values between 0 and 1, while others give values greater than 1.

So, my question is how do I normalize the similarity values obtained from different measures.

Extra detail on what I am actually trying to do: I have a set of words. I calculate pairwise similarity between the words. and remove the words that are not strongly correlated with other words in the set.

like image 927
nish Avatar asked Jul 31 '13 11:07

nish


People also ask

Is there a symmetric sentence similarity measure in WordNet?

print wn.synset('gorgeous.a.01').wup_similarity(wn.synset('amazing.a.01')) # None (!!!) We’ve built a symmetric sentence similarity measure. There are several issues with how Wordnet computes word similarity. Although the method has a lot of drawbacks, it performs fairly well.

How does it calculate the similarity between words?

It calculates the similarity based on how similar the word senses are and where the Synsets occur relative to each other in the hypernym tree. hello and selling are apparently 27% similar! This is because they share common hypernyms further up the two. Code #3 : Let’s check the hypernyms in between.

How is relatedness calculated in WordNet?

It calculates relatedness by considering the depths of the two synsets in the WordNet taxonomies, along with the depth of the LCS (Least Common Subsumer). The score can be 0 < score <= 1. The score can never be zero because the depth of the LCS is never zero (the depth of the root of taxonomy is one).

How to calculate similarity between two synsets?

One of the core metrics used to calculate similarity is the shortest path the distance between the two Synsets and their common hypernym. Code #4 : Let’s understand the use of hypernerm. Note : The similarity score is very high i.e. they are many steps away from each other because they are not so similar.


1 Answers

How to normalize a single measure

Let's consider a single arbitrary similarity measure M and take an arbitrary word w.

Define m = M(w,w). Then m takes maximum possible value of M.

Let's define MN as a normalized measure M.

For any two words w, u you can compute MN(w, u) = M(w, u) / m.

It's easy to see that if M takes non-negative values, then MN takes values in [0, 1].

How to normalize a measure combined from many measures

In order to compute your own defined measure F combined of k different measures m_1, m_2, ..., m_k first normalize independently each m_i using above method and then define:

alpha_1, alpha_2, ..., alpha_k

such that alpha_i denotes the weight of i-th measure.

All alphas must sum up to 1, i.e:

alpha_1 + alpha_2 + ... + alpha_k = 1

Then to compute your own measure for w, u you do:

F(w, u) = alpha_1 * m_1(w, u) + alpha_2 * m_2(w, u) + ... + alpha_k * m_k(w, u)

It's clear that F takes values in [0,1]

like image 189
pkacprzak Avatar answered Nov 15 '22 00:11

pkacprzak