Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the best cosine similarity in a set of vectors

I have n vectors, each with m elements (real number). I want to find the pair where there cosine similarity is maximum among all pairs.

The straightforward solution would require O(n2m) time.

Is there any better solution?

update

Cosine similarity / distance and triangle equation Inspires me that I could replace "cosine similarity" with "chord length" which loses precision but increases speed a lot. ( there are many existing solutions solving Nearest Neighbor in metric space, like ANN )

like image 244
hs3180 Avatar asked Dec 01 '12 16:12

hs3180


People also ask

Is higher or lower cosine similarity better?

The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance (due to the size of the document), chances are they may still be oriented closer together. The smaller the angle, higher the cosine similarity.

How do you find the accuracy of cosine similarity?

2.4. Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is often used to measure document similarity in text analysis.

What is a good cosine similarity score?

The higher similarity, the lower distances. When you pick the threshold for similarities for text/documents, usually a value higher than 0.5 shows strong similarities.


1 Answers

Cosine similarity sim(a,b) is related to Euclidean distance |a - b| by

|a - b|² = 2(1 - sim(a,b))

for unit vectors a and b.

That means cosine similarity is greatest when Euclidean distance is smallest after normalizing by the L2 norm, and the problem reduces to the closest pair of points problem, which can be solved in O(n lg n) time.

like image 129
Fred Foo Avatar answered Oct 18 '22 22:10

Fred Foo