I have 2 ndarrays of (n_samples, n_dimensions) and I want for each pair of corresponding rows, so the output would be (n_samples, )
Using sklearn's implementation I get (n_samples, n_samples) result - which obviously makes a lot of irrelevant calculations which is unacceptable in my case.
Using 1 - scipy's implementation is impossible because it expects vectors and not matrices.
What would be the most efficient way to execute what I'm looking for?
Assuming the two arrays x and y have the same shape,
np.einsum (reference)x and y def matrix_cosine(x, y):
return np.einsum('ij,ij->i', x, y) / (
np.linalg.norm(x, axis=1) * np.linalg.norm(y, axis=1)
)
And a little code to test;
x = np.random.randn(100000, 100)
%timeit matrix_cosine(x, x)
82.8 ms ± 2.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
assert np.allclose(matrix_cosine(x, x), np.ones(x.shape[0]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With