Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Euclidean distance vs cosine distance in sparse vectors - How come Euclidean performs better?

I have a dataset of very sparse vectors df (over 95% zeros) and I am measuring the distance between another sparse vector sample.

Now since I'm dealing with very sparse vectors, I assumed cosine distance would be calculated much faster than euclidean, but that doesn't seem the case.

Is this normal behavior? Or am I doing something wrong? Or maybe it's not even true that cosine distance is more efficient in sparse vectors?

(all_distances include many types of distances, but the only ones we are talking about here are scipy.spatial.distance.euclidean and scipy.spatial.distance.cosine)

My code

for d_name, d_func in all_distances.items():

    tot_time = []
    for i in range(100):
        start_time = time()
        df['distance'] = df.apply(d_func, axis=1, args=(sample,))
        df.sort_values(by='distance', ascending=True, inplace=True)
        df.drop('distance', axis=1, inplace=True)
        df = df.reset_index(drop=True)
        tot_time.append(time() - start_time)
    
     print("Mean time for {}: {}s".format(d_name, round(mean(tot_time), 4)))

Result:

Mean time for cosine: 0.8034s

Mean time for euclidean: 0.708s

like image 934
bluesummers Avatar asked Nov 26 '25 23:11

bluesummers


1 Answers

Cosine similarity needs the norm of both input vectors, as well as the dot product between them:

cos(theta) = dot(a,b) / (norm(a) * norm(b))

So, even though the dot product only accumulates when both a[i] and b[i] are nonzero, you still need to accumulate the norm for both a and b, which itself is about as much work as accumulating the Euclidean distance.

Most of the work will be in iterating through the sparse vectors anyway -- note that there is not actually much difference in performance between them. However, a plausible explanation of the difference is that the cosine computation needs to do slightly more arithmetic.

like image 129
comingstorm Avatar answered Nov 29 '25 15:11

comingstorm



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!