Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nearest neighbor matching in Pandas

Tags:

python

pandas

Given two DataFrames (t1, t2), both with a column 'x', how would I append a column to t1 with the ID of t2 whose 'x' value is the nearest to the 'x' value in t1?

t1:
id  x
1   1.49
2   2.35

t2:
id  x
3   2.36
4   1.5

output:
id  id2
1   4
2   3

I can do this by creating a new DataFrame and iterating on t1.groupby() and doing look ups on t2 then merging, but this take incredibly long given a 17 million row t1 DataFrame.

Is there a better way to accomplish? I've scoured the pandas docs regarding groupby, apply, transform, agg, etc. But an elegant solution has yet to present itself despite my thought that this would be a common problem.

like image 309
robotshapes Avatar asked Mar 05 '23 03:03

robotshapes


1 Answers

Using merge_asof

df = pd.merge_asof(df1.sort_values('x'),
                   df2.sort_values('x'),
                   on='x', 
                   direction='nearest', 
                   suffixes=['', '_2'])

print(df)
Out[975]: 
   id     x  id_2
0   3  0.87     6
1   1  1.49     5
2   2  2.35     4

Method 2 reindex

df1['id2']=df2.set_index('x').reindex(df1.x,method='nearest').values
df1
   id     x  id2
0   1  1.49    4
1   2  2.35    3
like image 60
BENY Avatar answered Mar 18 '23 18:03

BENY