about word2vec most_similar() function

Question

I'm using the most_similar() method as below to get all the words similar to a given word:

word,score= model.most_similar('apple',topn=sizeofdict)

AFAIK, what this does is, calculate the cosine similarity between the given word and all the other words in the dictionary. When i'm inspecting the words and scores, I can see there are words with negative score down the list. What does this mean? are them the words that has opposite meaning to the given word?

Also if it's using cosine similarity, how does it get a negative value? cosine similarity varies between 0-1 for two documents.

kampta · Accepted Answer

Yes, it does calculate cosine similarity between the given word and all the other words in the vocabulary

No, negative score doesn't mean the two words have opposite meaning. Cosine similarity is part of the cost function used in training word2vec model. The model is reducing the angle between vectors of similar words, so similar words be clustered together in the high dimensional sphere. Typically, for word vectors, cosine similarity > 0.6 means they are similar in meaning.

No, cosine similarity between two vectors lie between -1 and 1. [0, 1] similarity implies vectors having angles between 0 and 90 degrees. Negative similarity implies angles between 90 and 180 degrees.

about word2vec most_similar() function

Tags:

text-mining

gensim

word2vec

samsamara

1 Answers

kampta

Recent Activity

Donate For Us

about word2vec most_similar() function

Tags:

text-mining

gensim

word2vec

samsamara

1 Answers

kampta

Related questions

Recent Activity

Donate For Us