I have found this nice function pandas.merge_asof
.
From the documentation
pandas.merge_asof(left, right, on=None, left_on=None, right_on=None)
Parameters:
left : DataFrame
right : DataFrame
on : label
Field name to join on. Must be found in both DataFrames.
The data MUST be ordered.
Furthermore this must be a numeric column,such as datetimelike, integer, or float.
On or left_on/right_on must be given.
and it works as expected.
However, my merged dataframe keeps as columns on
only the one that originally was in left
. I would need to keep them both, so to have
mydf=pandas.merge_asof(left, right, on='Time')
and mydf
to contain both Time
from left
and right
Example data:
a=pd.DataFrame(data=pd.date_range('20100201', periods=100, freq='6h3min'),columns=['Time'])
b=pd.DataFrame(data=
pd.date_range('20100201', periods=24, freq='1h'),columns=['Time'])
b['val']=range(b.shape[0])
out=pd.merge_asof(a,b,on='Time',direction='forward',tolerance=pd.Timedelta('30min'))
How to Keep index when using Pandas Merge. By default, Pandas merge creates a new integer index for the merged DataFrame. If we wanted to preserve the index from the first DataFrame as the index of the merged DataFrame, we can specify the index explicitly using . set_axis(df1.
Answer. Yes. Order of the merged dataframes will effect the order of the rows and columns of the merged dataframe. When using the merge() method, it will preserve the order of the left keys.
Perform an asof merge. This is similar to a left-join except that we match on nearest key rather than equal keys. Both DataFrames must be sorted by the key.
Pandas Join vs Merge Differences The main difference between join vs merge would be; join() is used to combine two DataFrames on the index but not on columns whereas merge() is primarily used to specify the columns you wanted to join on, this also supports joining on indexes and combination of index and columns.
I think one possible solution is rename columns:
out = pd.merge_asof(a.rename(columns={'Time':'Time1'}),
b.rename(columns={'Time':'Time2'}),
left_on='Time1',
right_on='Time2',
direction='forward',
tolerance=pd.Timedelta('30min'))
print (out.head())
Time1 Time2 val
0 2010-02-01 00:00:00 2010-02-01 0.0
1 2010-02-01 06:03:00 NaT NaN
2 2010-02-01 12:06:00 NaT NaN
3 2010-02-01 18:09:00 NaT NaN
4 2010-02-02 00:12:00 NaT NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With