Merging pandas dataframes based on nearest value(s)

Question

I have two dataframes, say A and B, that have some columns named attr1, attr2, attrN.

I have a certain distance measure, and I would like to merge the dataframes, such that each row in A is merged with the row in B that has the shortest distance between attributes. Note that rows in B can be repeated when merging.

For example (with one attribute to keep things simple), merging these two tables using absolute difference distance |A.attr1 - B.att1|

A | attr1      B | attr1
0 | 10         0 | 15
1 | 20         1 | 27
2 | 30         2 | 80

should yield the following merged table

M | attr1_A  attr1_B
0 | 10       15
1 | 20       15
2 | 30       27

My current way of doing this is slow and is based on comparing each row of A with each row of B, but code is also not clear because I have to preserve indices for merging and I am not satisfied at all, but I cannot come up with a better solution.

How can I perform the merge as above using pandas? Are there any convenience methods or functions that can be helpful here?

EDIT: Just to clarify, in the dataframes there are also other columns which are not used in the distance calculation, but have to be merged as well.

Scott Boston · Accepted Answer

One way you could do it as follows:

A = pd.DataFrame({'attr1':[10,20,30]})
B = pd.DataFrame({'attr1':[15,15,27]})

Use a cross join to get all combinations

Update for 1.2+ pandas use how='cross'

merge_AB = A.merge(B, how='cross', suffixes = ('_A', '_B'))

Older pandas version use psuedo key...

A = A.assign(key=1)
B = B.assign(key=1)

merged_AB =pd.merge(A,B, on='key',suffixes=('_A','_B'))

Now let's find the min distances in merged_AB

M = merged_AB.groupby('attr1_A').apply(lambda x:abs(x['attr1_A']-x['attr1_B'])==abs(x['attr1_A']-x['attr1_B']).min())

merged_AB[M.values].drop_duplicates().drop('key',axis=1)

Output:

   attr1_A  attr1_B
0       10       15
3       20       15
8       30       27

Merging pandas dataframes based on nearest value(s)

Tags:

python

pandas

AkiRoss

1 Answers

Scott Boston

Recent Activity

Donate For Us

Merging pandas dataframes based on nearest value(s)

Tags:

python

pandas

AkiRoss

1 Answers

Scott Boston

Related questions

Recent Activity

Donate For Us