Nearest neighbor matching in Pandas

Question

Given two DataFrames (t1, t2), both with a column 'x', how would I append a column to t1 with the ID of t2 whose 'x' value is the nearest to the 'x' value in t1?

t1:
id  x
1   1.49
2   2.35

t2:
id  x
3   2.36
4   1.5

output:
id  id2
1   4
2   3

I can do this by creating a new DataFrame and iterating on t1.groupby() and doing look ups on t2 then merging, but this take incredibly long given a 17 million row t1 DataFrame.

Is there a better way to accomplish? I've scoured the pandas docs regarding groupby, apply, transform, agg, etc. But an elegant solution has yet to present itself despite my thought that this would be a common problem.

BENY · Accepted Answer

Using merge_asof

df = pd.merge_asof(df1.sort_values('x'),
                   df2.sort_values('x'),
                   on='x', 
                   direction='nearest', 
                   suffixes=['', '_2'])

print(df)
Out[975]: 
   id     x  id_2
0   3  0.87     6
1   1  1.49     5
2   2  2.35     4

Method 2 reindex

df1['id2']=df2.set_index('x').reindex(df1.x,method='nearest').values
df1
   id     x  id2
0   1  1.49    4
1   2  2.35    3

Nearest neighbor matching in Pandas

Tags:

python

pandas

robotshapes

1 Answers

BENY

Recent Activity

Donate For Us

Nearest neighbor matching in Pandas

Tags:

python

pandas

robotshapes

1 Answers

BENY

Related questions

Recent Activity

Donate For Us