I've got two dataframes (logs
and failures
), which I would like to merge so that I add in logs
a column which has the value of the closest date found in 'failures'.
The code to generate logs
, failures
, and the desired output
is below:
import pandas as pd
logs=pd.DataFrame({'date-time':pd.Series(['23/10/2015 10:20:54','22/10/2015 09:51:32','21/10/2015 06:51:32','28/10/2015 16:59:32','25/10/2015 04:41:32','24/10/2015 11:50:11']),'var1':pd.Series([0,1,3,1,2,4])})
logs['date-time']=pd.to_datetime(logs['date-time'])
failures=pd.DataFrame({'date':pd.Series(['23/10/2015 00:00:00','22/10/2015 00:00:00','21/10/2015 00:00:00']),'failure':pd.Series([1,1,1])})
failures['date']=pd.to_datetime(failures['date'])
output=pd.DataFrame({'date-time':pd.Series(['23/10/2015 10:20:54','22/10/2015 09:51:32','21/10/2015 06:51:32','28/10/2015 16:59:32','25/10/2015 04:41:32','24/10/2015 11:50:11']),'var1':pd.Series([0,1,3,1,2,4]),'closest_failure':pd.Series(['23/10/2015 00:00:00','22/10/2015 00:00:00','21/10/2015 00:00:00','23/10/2015 00:00:00','23/10/2015 00:00:00','23/10/2015 00:00:00'])})
output['date-time']=pd.to_datetime(output['date-time'])
Any ideas? The real dataset is very large, so efficiency is also a concern.
In Pandas >= 0.19.0 you can now use pandas.merge_asof
to get near matches. With 0.19 you're limited to taking the most recent failure value before or at the log value. However with 0.20 you can get the nearest in either direction.
Perform an asof merge. This is similar to a left-join except that we match on nearest key rather than equal keys.
For each row in the left DataFrame, we select the last row in the right DataFrame whose ‘on’ key is less than or equal to the left’s key. Both DataFrames must be sorted by the key.
In [3]: failures.sort_values("date", inplace=True)
In [6]: logs2=pd.DataFrame({'date-time':pd.Series(['23/10/2015 10:20:54','22/10/2015 09:51:32','21/10/2015 06:51:32','28/10/2015 16:59:32','25/10/2015 04:41:32','24/10/2015 11:50
...: :11', "20/10/2015 01:02:03"]),'var1':pd.Series([0,1,3,1,2,4, 99])})
...:
In [7]: logs2['date-time']=pd.to_datetime(logs2['date-time'])
In [8]: logs2.sort_values("date-time", inplace=True)
In [9]: logs2
Out[9]:
date-time var1
6 2015-10-20 01:02:03 99
2 2015-10-21 06:51:32 3
1 2015-10-22 09:51:32 1
0 2015-10-23 10:20:54 0
5 2015-10-24 11:50:11 4
4 2015-10-25 04:41:32 2
3 2015-10-28 16:59:32 1
In [10]: pd.merge_asof(logs2, failures, left_on="date-time", right_on="date")
Out[10]:
date-time var1 date failure
0 2015-10-20 01:02:03 99 NaT NaN
1 2015-10-21 06:51:32 3 2015-10-21 1.0
2 2015-10-22 09:51:32 1 2015-10-22 1.0
3 2015-10-23 10:20:54 0 2015-10-23 1.0
4 2015-10-24 11:50:11 4 2015-10-23 1.0
5 2015-10-25 04:41:32 2 2015-10-23 1.0
6 2015-10-28 16:59:32 1 2015-10-23 1.0
In [11]: pd.merge_asof(logs2, failures, left_on="date-time", right_on="date", direction="nearest")
Out[11]:
date-time var1 date failure
0 2015-10-20 01:02:03 99 2015-10-21 1
1 2015-10-21 06:51:32 3 2015-10-21 1
2 2015-10-22 09:51:32 1 2015-10-22 1
3 2015-10-23 10:20:54 0 2015-10-23 1
4 2015-10-24 11:50:11 4 2015-10-23 1
5 2015-10-25 04:41:32 2 2015-10-23 1
6 2015-10-28 16:59:32 1 2015-10-23 1
You can reindex with method="nearest". There may be a neater way, but using a Series with the failure logs in the index and values works:
In [11]: failures_dt = pd.Series(failures["date"].values, failures["date"])
In [12]: failures_dt.reindex(logs["date-time"], method="nearest")
Out[12]:
date-time
2015-10-23 10:20:54 2015-10-23
2015-10-22 09:51:32 2015-10-22
2015-10-21 06:51:32 2015-10-21
2015-10-28 16:59:32 2015-10-23
2015-10-25 04:41:32 2015-10-23
2015-10-24 11:50:11 2015-10-23
dtype: datetime64[ns]
In [13]: logs["nearest"] = failures_dt.reindex(logs["date-time"], method="nearest").values
In [14]: logs
Out[14]:
date-time var1 nearest
0 2015-10-23 10:20:54 0 2015-10-23
1 2015-10-22 09:51:32 1 2015-10-22
2 2015-10-21 06:51:32 3 2015-10-21
3 2015-10-28 16:59:32 1 2015-10-23
4 2015-10-25 04:41:32 2 2015-10-23
5 2015-10-24 11:50:11 4 2015-10-23
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With