Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use cosine similarity in Word2Vec when its trained using dot-product similarity

According to several posts I found on stackoverflow (for instance this Why does word2Vec use cosine similarity?), it's common practice to calculate the cosine similarity between two word vectors after we have trained a word2vec (either CBOW or Skip-gram) model. However, this seems a little odd to me since the model is actually trained with dot-product as a similarity score. One evidence of this is that the norm of the word vectors we get after training are actually meaningful. So why is it that people still use cosine-similarity instead of dot-product when calculating the similarity between two words?

like image 413
Fred Zhang Avatar asked Jan 28 '19 22:01

Fred Zhang


People also ask

Why does Word2Vec use cosine similarity?

Cosine similarity says that two vectors point in the same direction, but they could have different magnitudes. For example, cosine similarity makes sense comparing bag-of-words for documents. Two documents might be of different length, but have similar distributions of words.

Why cosine similarity is better than dot product?

Because cosine is not affected by vector length, the large vector length of embeddings of popular videos does not contribute to similarity. Thus, switching to cosine from dot product reduces the similarity for popular videos.

Does Word2Vec use cosine similarity?

Word2Vec is a model used to represent words into vectors. Then, the similarity value can be generated using the Cosine Similarity formula of the word vector values produced by the Word2Vec model.

Why cosine similarity is used for word embeddings?

For our case study, we had used cosine similarity. This uses the word embeddings of the words in two texts to measure the minimum distance that the words in one text need to “travel” in semantic space to reach the words in the other text. Euclidean distance between two points is the length of the path connecting them.


1 Answers

Cosine similarity and Dot product are both similarity measures but dot product is magnitude sensitive while cosine similarity is not. Depending on the occurance count of a word it might have a large or small dot product with another word. We normally normalize our vector to prevent this effect so all vectors have unit magnitude. But if your particular downstream task requires occurance count as a feature then dot product might be the way to go, but if you do not care about counts then you can simlpy calculate the cosine similarity which will normalize them.

like image 100
shiredude95 Avatar answered Sep 19 '22 01:09

shiredude95