I have two data frames that look like the following:
df_A:
ID x y
a 0 0
c 3 2
b 2 5
df_B:
ID x y
a 2 1
c 3 5
b 1 2
I want to add a column in db_B that is the Euclidean distance between the x,y coordinates in df_B from df_A for each identifier. The desired result would be:
ID x y dist
a 2 1 1.732
c 3 5 3
b 1 2 3.162
The identifiers are not necessarily going to be in the same order. I know how to do this by looping through the rows of df_A and finding the matching ID in df_B, but I was hoping to avoid using a for loop since this will be used on data with tens of millions of rows. Is there some way to use apply but condition it on matching IDs?
If ID
isn't the index, make it so.
df_B.set_index('ID', inplace=True)
df_A.set_index('ID', inplace=True)
df_B['dist'] = ((df_A - df_B) ** 2).sum(1) ** .5
Since index and columns are already aligned, simply doing the math should just work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With