Cosine similarity between matching rows in numpy ndarrays

Question

I have 2 ndarrays of (n_samples, n_dimensions) and I want for each pair of corresponding rows, so the output would be (n_samples, )

Using sklearn's implementation I get (n_samples, n_samples) result - which obviously makes a lot of irrelevant calculations which is unacceptable in my case.

Using 1 - scipy's implementation is impossible because it expects vectors and not matrices.

What would be the most efficient way to execute what I'm looking for?

cs95 · Accepted Answer

Assuming the two arrays x and y have the same shape,

Compute the element-wise dot product using np.einsum (reference)
Compute the product of the L2 (euclidean) norm for each row of x and y
Divide the result from (1) by (2)

def matrix_cosine(x, y):
    return np.einsum('ij,ij->i', x, y) / (
              np.linalg.norm(x, axis=1) * np.linalg.norm(y, axis=1)
    )

And a little code to test;

x = np.random.randn(100000, 100)

%timeit matrix_cosine(x, x)
82.8 ms ± 2.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

assert np.allclose(matrix_cosine(x, x), np.ones(x.shape[0]))

Cosine similarity between matching rows in numpy ndarrays

Tags:

python

arrays

numpy

distance

cosine-similarity

bluesummers

1 Answers

cs95

Recent Activity

Donate For Us

Cosine similarity between matching rows in numpy ndarrays

Tags:

python

arrays

numpy

distance

cosine-similarity

bluesummers

1 Answers

cs95

Related questions

Recent Activity

Donate For Us