Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

about word2vec most_similar() function

I'm using the most_similar() method as below to get all the words similar to a given word:

word,score= model.most_similar('apple',topn=sizeofdict)

AFAIK, what this does is, calculate the cosine similarity between the given word and all the other words in the dictionary. When i'm inspecting the words and scores, I can see there are words with negative score down the list. What does this mean? are them the words that has opposite meaning to the given word?

Also if it's using cosine similarity, how does it get a negative value? cosine similarity varies between 0-1 for two documents.

like image 674
samsamara Avatar asked Oct 30 '22 11:10

samsamara


1 Answers

Yes, it does calculate cosine similarity between the given word and all the other words in the vocabulary

No, negative score doesn't mean the two words have opposite meaning. Cosine similarity is part of the cost function used in training word2vec model. The model is reducing the angle between vectors of similar words, so similar words be clustered together in the high dimensional sphere. Typically, for word vectors, cosine similarity > 0.6 means they are similar in meaning.

No, cosine similarity between two vectors lie between -1 and 1. [0, 1] similarity implies vectors having angles between 0 and 90 degrees. Negative similarity implies angles between 90 and 180 degrees.

like image 147
kampta Avatar answered Nov 11 '22 06:11

kampta