Given two DataFrames (t1, t2), both with a column 'x', how would I append a column to t1 with the ID of t2 whose 'x' value is the nearest to the 'x' value in t1?
t1:
id x
1 1.49
2 2.35
t2:
id x
3 2.36
4 1.5
output:
id id2
1 4
2 3
I can do this by creating a new DataFrame and iterating on t1.groupby() and doing look ups on t2 then merging, but this take incredibly long given a 17 million row t1 DataFrame.
Is there a better way to accomplish? I've scoured the pandas docs regarding groupby, apply, transform, agg, etc. But an elegant solution has yet to present itself despite my thought that this would be a common problem.
Using merge_asof
df = pd.merge_asof(df1.sort_values('x'),
df2.sort_values('x'),
on='x',
direction='nearest',
suffixes=['', '_2'])
print(df)
Out[975]:
id x id_2
0 3 0.87 6
1 1 1.49 5
2 2 2.35 4
Method 2 reindex
df1['id2']=df2.set_index('x').reindex(df1.x,method='nearest').values
df1
id x id2
0 1 1.49 4
1 2 2.35 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With