I'm trying to merge two Pandas dataframes on two columns. One column has a unique identifier that could be used to simply .merge()
the two dataframes. However, the second column merge would actually use .merge_asof()
because it would need to find the closest date, not an exact date match.
There is a similar question here: Pandas Merge on Name and Closest Date, but it was asked and answered nearly three years ago, and merge_asof()
is a much newer addition.
I asked a similar here question a couple months ago, but the solution only needed to use merge_asof()
without any exact matches required.
In the interest of including some code, it would look something like this:
df = pd.merge_asof(df1, df2, left_on=['ID','date_time'], right_on=['ID','date_time'])
where the ID
's will match exactly, but the date_time
's will be "near matches".
Any help is greatly appreciated.
Consider merging first on the ID
and then run a DataFrame.apply
to return highest date_time from first dataframe on matched IDs less than the current row date_time from second dataframe.
# INITIAL MERGE (CROSS-PRODUCT OF ALL ID PAIRINGS)
mdf = pd.merge(df1, df2, on=['ID'])
def f(row):
col = mdf[(mdf['ID'] == row['ID']) &
(mdf['date_time_x'] < row['date_time_y'])]['date_time_x'].max()
return col
# FILTER BY MATCHED DATES TO CONDITIONAL MAX
mdf = mdf[mdf['date_time_x'] == mdf.apply(f, axis=1)].reset_index(drop=True)
This assumes you want to keep all rows of df2 (i.e., right join). Simply flip _x / _y suffixes for left join.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With