In python, is there a vectorized efficient way to calculate the cosine distance of a sparse array u
to a sparse matrix v
, resulting in an array of elements [1, 2, ..., n]
corresponding to cosine(u,v[0]), cosine(u,v[1]), ..., cosine(u, v[n])
?
Not natively. You can however use the library scipy
that can compute the cosine distance between two vectors for you: http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.spatial.distance.cosine.html. You can build a version that takes a matrix using this as a stepping stone.
Add the vector onto the end of the matrix, calculate a pairwise distance matrix using sklearn.metrics.pairwise_distances()
and then extract the relevant column/row.
So for vector v
(with shape (D,)
) and matrix m
(with shape (N,D)
) do:
import sklearn
from sklearn.metrics import pairwise_distances
new_m = np.concatenate([m,v[None,:]], axis=0)
distance_matrix = sklearn.metrics.pairwise_distances(new_m, axis=0), metric="cosine")
distances = distance_matrix[-1,:-1]
Not ideal, but better than iterating!
This method can be extended if you are querying more than one vector. To do this, a list of vectors can be concatenated instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With