Cosine similarity when one of vectors is all zeros

Question

How to express the cosine similarity ( http://en.wikipedia.org/wiki/Cosine_similarity )

when one of the vectors is all zeros?

v1 = [1, 1, 1, 1, 1]

v2 = [0, 0, 0, 0, 0]

When we calculate according to the classic formula we get division by zero:

Let d1 = 0 0 0 0 0 0
Let d2 = 1 1 1 1 1 1
Cosine Similarity (d1, d2) =  dot(d1, d2) / ||d1|| ||d2||dot(d1, d2) = (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) = 0

||d1|| = sqrt((0)^2 + (0)^2 + (0)^2 + (0)^2 + (0)^2 + (0)^2) = 0

||d2|| = sqrt((1)^2 + (1)^2 + (1)^2 + (1)^2 + (1)^2 + (1)^2) = 2.44948974278

Cosine Similarity (d1, d2) = 0 / (0) * (2.44948974278)
                           = 0 / 0

I want to use this similarity measure in a clustering application. And I often will need to compare such vectors. Also [0, 0, 0, 0, 0] vs. [0, 0, 0, 0, 0]

Do you have any experience? Since this is a similarity (not a distance) measure should I use special case for

d( [1, 1, 1, 1, 1]; [0, 0, 0, 0, 0] ) = 0

d([0, 0, 0, 0, 0]; [0, 0, 0, 0, 0] ) = 1

what about

d([1, 1, 1, 0, 0]; [0, 0, 0, 0, 0] ) = ? etc.

Gyro Gearloose · Accepted Answer

It is undefined.

Think you have a vector C that is not zero in place your zero vector. Multiply it by epsilon > 0 and let run epsilon to zero. The result will depend on C, so the function is not continuous when one of the vectors is zero.

Has QUIT--Anony-Mousse · Answer

If you have 0 vectors, cosine is the wrong similarity function for your application.

Cosine distance is essentially equivalent to squared Euclidean distance on L_2 normalized data. I.e. you normalize every vector to unit length 1, then compute squared Euclidean distance.

The other benefit of Cosine is performance - computing it on very sparse, high-dimensional data is faster than Euclidean distance. It benefits from sparsity to the square, not just linear.

While you obviously can try to hack the similarity to be 0 when exactly one is zero, and maximal when they are identical, it won't really solve the underlying problems.

Don't choose the distance by what you can easily compute.

Instead, choose the distance such that the result has a meaning on your data. If the value is undefined, you don't have a meaning...

Sometimes, it may work to discard constant-0 data as meaningless data anyway (e.g. analyzing Twitter noise, and seeing a Tweet that is all numbers, no words). Sometimes it doesn't.

Cosine similarity when one of vectors is all zeros

Tags:

machine-learning

cluster-analysis

data-mining

cosine-similarity

Sebastian Widz

2 Answers

Gyro Gearloose

Has QUIT--Anony-Mousse

Recent Activity

Donate For Us

Cosine similarity when one of vectors is all zeros

Tags:

machine-learning

cluster-analysis

data-mining

cosine-similarity

Sebastian Widz

2 Answers

Gyro Gearloose

Has QUIT--Anony-Mousse

Related questions

Recent Activity

Donate For Us