Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does word2Vec use cosine similarity?

I have been reading the papers on Word2Vec (e.g. this one), and I think I understand training the vectors to maximize the probability of other words found in the same contexts.

However, I do not understand why cosine is the correct measure of word similarity. Cosine similarity says that two vectors point in the same direction, but they could have different magnitudes.

For example, cosine similarity makes sense comparing bag-of-words for documents. Two documents might be of different length, but have similar distributions of words.

Why not, say, Euclidean distance?

Can anyone one explain why cosine similarity works for word2Vec?

like image 713
opus111 Avatar asked Jul 17 '16 16:07

opus111


People also ask

Does word2vec use cosine similarity?

Word2Vec is a model used to represent words into vectors. Then, the similarity value can be generated using the Cosine Similarity formula of the word vector values produced by the Word2Vec model.

Why do we use cosine similarity?

Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is often used to measure document similarity in text analysis.

Why is cosine similarity used in NLP?

Why do we use cosine similarity in NLP? In NLP, Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically, it calculates the cosine of the angle between two vectors projected in a multi-dimensional space.

Why cosine similarity is better than dot product?

Because cosine is not affected by vector length, the large vector length of embeddings of popular videos does not contribute to similarity. Thus, switching to cosine from dot product reduces the similarity for popular videos.


1 Answers

Those two distance metrics are probably strongly correlated so it might not matter all that much which one you use. As you point out, cosine distance means we don't have to worry about the length of the vectors at all.

This paper indicates that there is a relationship between the frequency of the word and the length of the word2vec vector. http://arxiv.org/pdf/1508.02297v1.pdf

like image 151
Aaron Avatar answered Nov 13 '22 10:11

Aaron