What is the difference between Scikit-learn's sklearn.metrics.pairwise.cosine_similarity and sklearn.metrics.pairwise.pairwise_distances(.. metric="cosine")?
from sklearn.feature_extraction.text import TfidfVectorizer
documents = (
"Macbook Pro 15' Silver Gray with Nvidia GPU",
"Macbook GPU"
)
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
from sklearn.metrics.pairwise import cosine_similarity
print(cosine_similarity(tfidf_matrix[0:1], tfidf_matrix)[0,1])
0.37997836
from sklearn.metrics.pairwise import pairwise_distances
print(pairwise_distances(tfidf_matrix[0:1], tfidf_matrix, metric='cosine')[0,1])
0.62002164
Why are these different?
From source code documentation:
Cosine distance is defined as 1.0 minus the cosine similarity.
So your result make sense.
pairwise distance provide distance between two array.so more pairwise distance means less similarity.while cosine similarity is 1-pairwise_distance so more cosine similarity means more similarity between two arrays.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With