I'm using the most_similar()
method as below to get all the words similar to a given word:
word,score= model.most_similar('apple',topn=sizeofdict)
AFAIK, what this does is, calculate the cosine similarity between the given word and all the other words in the dictionary. When i'm inspecting the words and scores, I can see there are words with negative score down the list. What does this mean? are them the words that has opposite meaning to the given word?
Also if it's using cosine similarity, how does it get a negative value? cosine similarity varies between 0-1 for two documents.
Yes, it does calculate cosine similarity between the given word and all the other words in the vocabulary
No, negative score doesn't mean the two words have opposite meaning. Cosine similarity is part of the cost function used in training word2vec model. The model is reducing the angle between vectors of similar words, so similar words be clustered together in the high dimensional sphere. Typically, for word vectors, cosine similarity > 0.6 means they are similar in meaning.
No, cosine similarity between two vectors lie between -1 and 1. [0, 1] similarity implies vectors having angles between 0 and 90 degrees. Negative similarity implies angles between 90 and 180 degrees.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With