Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging pandas dataframes based on nearest value(s)

Tags:

python

pandas

I have two dataframes, say A and B, that have some columns named attr1, attr2, attrN.

I have a certain distance measure, and I would like to merge the dataframes, such that each row in A is merged with the row in B that has the shortest distance between attributes. Note that rows in B can be repeated when merging.

For example (with one attribute to keep things simple), merging these two tables using absolute difference distance |A.attr1 - B.att1|

A | attr1      B | attr1
0 | 10         0 | 15
1 | 20         1 | 27
2 | 30         2 | 80

should yield the following merged table

M | attr1_A  attr1_B
0 | 10       15
1 | 20       15
2 | 30       27

My current way of doing this is slow and is based on comparing each row of A with each row of B, but code is also not clear because I have to preserve indices for merging and I am not satisfied at all, but I cannot come up with a better solution.

How can I perform the merge as above using pandas? Are there any convenience methods or functions that can be helpful here?

EDIT: Just to clarify, in the dataframes there are also other columns which are not used in the distance calculation, but have to be merged as well.

like image 791
AkiRoss Avatar asked Feb 04 '23 15:02

AkiRoss


1 Answers

One way you could do it as follows:

A = pd.DataFrame({'attr1':[10,20,30]})
B = pd.DataFrame({'attr1':[15,15,27]})

Use a cross join to get all combinations

Update for 1.2+ pandas use how='cross'

merge_AB = A.merge(B, how='cross', suffixes = ('_A', '_B'))

Older pandas version use psuedo key...

A = A.assign(key=1)
B = B.assign(key=1)

merged_AB =pd.merge(A,B, on='key',suffixes=('_A','_B'))

Now let's find the min distances in merged_AB

M = merged_AB.groupby('attr1_A').apply(lambda x:abs(x['attr1_A']-x['attr1_B'])==abs(x['attr1_A']-x['attr1_B']).min())

merged_AB[M.values].drop_duplicates().drop('key',axis=1)

Output:

   attr1_A  attr1_B
0       10       15
3       20       15
8       30       27
like image 142
Scott Boston Avatar answered Feb 07 '23 19:02

Scott Boston