Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching IDs Between Pandas DataFrames and Applying Function

I have two data frames that look like the following:

df_A:

ID    x     y
a     0     0
c     3     2
b     2     5

df_B:

ID    x     y
a     2     1
c     3     5
b     1     2

I want to add a column in db_B that is the Euclidean distance between the x,y coordinates in df_B from df_A for each identifier. The desired result would be:

ID    x     y    dist
a     2     1    1.732
c     3     5    3
b     1     2    3.162

The identifiers are not necessarily going to be in the same order. I know how to do this by looping through the rows of df_A and finding the matching ID in df_B, but I was hoping to avoid using a for loop since this will be used on data with tens of millions of rows. Is there some way to use apply but condition it on matching IDs?

like image 437
Megan Avatar asked Feb 05 '23 11:02

Megan


1 Answers

If ID isn't the index, make it so.

df_B.set_index('ID', inplace=True)
df_A.set_index('ID', inplace=True)

df_B['dist'] = ((df_A - df_B) ** 2).sum(1) ** .5

Since index and columns are already aligned, simply doing the math should just work.

like image 149
piRSquared Avatar answered Feb 08 '23 15:02

piRSquared