It looks like scipy.spatial.distance.cdist cosine similariy distance:
link to cos distance 1
1 - u*v/(||u||||v||)
is different from sklearn.metrics.pairwise.cosine_similarity which is
link to cos similarity 2
u*v/||u||||v||
Does anybody know reason for different definitions?
Usually, people use the cosine similarity as a similarity metric between vectors. Now, the distance can be defined as 1-cos_similarity. The intuition behind this is that if 2 vectors are perfectly the same then similarity is 1 (angle=0) and thus, distance is 0 (1-1=0).
The Euclidean distance corresponds to the L2-norm of a difference between vectors. The cosine similarity is proportional to the dot product of two vectors and inversely proportional to the product of their magnitudes.
The formula to find the cosine similarity between two vectors is – Cos(x, y) = x . y / ||x|| * ||y|| where, x .
2.4. Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is often used to measure document similarity in text analysis.
Good question but yes, these are 2 different things but connected by the following equation:
Cosine_distance = 1 - cosine_similarity
Why?
Usually, people use the cosine similarity as a similarity metric between vectors. Now, the distance can be defined as 1-cos_similarity.
The intuition behind this is that if 2 vectors are perfectly the same then similarity is 1 (angle=0) and thus, distance is 0 (1-1=0).
Similarly you can define the cosine distance for the resulting similarity value range.
Cosine similarity range: −1 meaning exactly opposite, 1 meaning exactly the same, 0 indicating orthogonality.
References: Scipy wolfram
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With