Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas: Finding cosine similarity of two columns

Suppose I have two columns in a python pandas.DataFrame:

          col1 col2
item_1    158  173
item_2     25  191
item_3    180   33
item_4    152  165
item_5     96  108

What's the best way to take the cosine similarity of these two columns?

like image 760
hlin117 Avatar asked Sep 09 '14 04:09

hlin117


People also ask

How do you find the cosine similarity between two columns in Python?

First, you concatenate 2 columns of interest into a new data frame. Then you drop NaN. After that those 2 columns have only corresponding rows, and you can compare them with cosine distance or any other pairwise distance you wish.

How do you find cosine similarity in Python?

We use the below formula to compute the cosine similarity. where A and B are vectors: A.B is dot product of A and B: It is computed as sum of element-wise product of A and B. ||A|| is L2 norm of A: It is computed as square root of the sum of squares of elements of the vector A.

How do you find the difference between two columns in pandas?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.


2 Answers

Is that what you're looking for?

from scipy.spatial.distance import cosine
from pandas import DataFrame


df = DataFrame({"col1": [158, 25, 180, 152, 96],
                "col2": [173, 191, 33, 165, 108]})

print(1 - cosine(df["col1"], df["col2"]))
like image 177
xbello Avatar answered Sep 27 '22 17:09

xbello


You can also use cosine_similarity or other similarity metrics from sklearn.metrics.pairwise.

from sklearn.metrics.pairwise import cosine_similarity

cosine_similarity(df.col1, df.col2)
Out[4]: array([[0.7498213]])
like image 28
Amir Imani Avatar answered Sep 27 '22 18:09

Amir Imani