Matching IDs Between Pandas DataFrames and Applying Function

Question

I have two data frames that look like the following:

df_A:

ID    x     y
a     0     0
c     3     2
b     2     5

df_B:

ID    x     y
a     2     1
c     3     5
b     1     2

I want to add a column in db_B that is the Euclidean distance between the x,y coordinates in df_B from df_A for each identifier. The desired result would be:

ID    x     y    dist
a     2     1    1.732
c     3     5    3
b     1     2    3.162

The identifiers are not necessarily going to be in the same order. I know how to do this by looping through the rows of df_A and finding the matching ID in df_B, but I was hoping to avoid using a for loop since this will be used on data with tens of millions of rows. Is there some way to use apply but condition it on matching IDs?

piRSquared · Accepted Answer

If ID isn't the index, make it so.

df_B.set_index('ID', inplace=True)
df_A.set_index('ID', inplace=True)

df_B['dist'] = ((df_A - df_B) ** 2).sum(1) ** .5

Since index and columns are already aligned, simply doing the math should just work.

Matching IDs Between Pandas DataFrames and Applying Function

Tags:

performance

python

pandas

numpy

apply

Megan

1 Answers

piRSquared

Recent Activity

Donate For Us

Matching IDs Between Pandas DataFrames and Applying Function

Tags:

performance

python

pandas

numpy

apply

Megan

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us