Can I use cosine similarity between rows using only non null values?

Question

I want to find the cosine similarity (or euclidean distance if easier) between one query row, and 10 other rows. These rows are full of nan values, so if a column is nan they are to be ignored.

For example, query :

A   B   C   D   E   F
3   2  NaN  5  NaN  4

df =

A   B   C   D   E   F
2   1   3  NaN  4   5
1  NaN  2   4  NaN  3
.   .   .   .   .   .
.   .   .   .   .   .

So I just want to get the cosine similarity between every non null column that query and the rows from df have in column. So for row 0 in df A, B, and F are non null in both query and df.

I then want to print the cosine similarity for each row.

Thanks in advance

Mattie Knebel-Langford · Accepted Answer

For euclidean - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.nan_euclidean_distances.html This ignores nan's in it's calculations

For cosine similarity, you cannot simply fillna as this will change your similarity score. Instead, take subsets of your df and calculate the cosine similarity across columns that do not contain null values.

For your example dataframe, this would calculate cosine similarity across all rows using just columns A, & F, across query and row 1 using A, B, & F, and across query and row 2 using A, D, F. You would need to follow this up with some sort of ranking on which score to choose.

combinations = []
df.apply(lambda x: combinations.append(list(x.dropna().index)), axis=1)

# remove duplicate null combinations
combinations = [list(item) for item in set(tuple(row) for row in combinations)]

for i in combinations:
    pdist(df[i].dropna(), metric='cosine')

Can I use cosine similarity between rows using only non null values?

Tags:

python

pandas

trigonometry

toothsie

1 Answers

Mattie Knebel-Langford

Recent Activity

Donate For Us

Can I use cosine similarity between rows using only non null values?

Tags:

python

pandas

trigonometry

toothsie

1 Answers

Mattie Knebel-Langford

Related questions

Recent Activity

Donate For Us