I have a data frame with a datetime column every 10 minutes and a numerical value:
df1 = pd.DataFrame({'time' : pd.date_range('1/1/2018', periods=20, freq='10min'), 'value' : np.random.randint(2, 20, size=20)})
And another with a schedule of events, with a start time and end time. There can be multiple events happening at the same time:
df2 = pd.DataFrame({'start_time' : ['2018-01-01 00:00:00', '2018-01-01 00:00:00','2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00' ], 'end_time' : ['2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00','2018-01-01 02:00:00', '2018-01-01 02:00:00', '2018-01-01 03:00:00'], 'event' : ['A', 'B', 'C', 'D', 'E', 'F'] })
df2[['start_time', 'end_time']] = df2.iloc[:,0:2].apply(pd.to_datetime)
I want to do a left join on df1, with all events that fall inside the start and end times. My output table should be:
time value event
0 2018-01-01 00:00:00 5 A
1 2018-01-01 00:00:00 5 B
2 2018-01-01 00:10:00 15 A
3 2018-01-01 00:10:00 15 B
4 2018-01-01 00:20:00 16 A
5 2018-01-01 00:20:00 16 B
.....
17 2018-01-01 02:50:00 7 F
I attempted these SO solutions, but they fail because of duplicate time intervals.
Setup (Only using a few entries from df1
for brevity):
df1 = pd.DataFrame({'time' : pd.date_range('1/1/2018', periods=20, freq='10min'), 'value' : np.random.randint(2, 20, size=20)})
df2 = pd.DataFrame({'start_time' : ['2018-01-01 00:00:00', '2018-01-01 00:00:00','2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00' ], 'end_time' : ['2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00','2018-01-01 02:00:00', '2018-01-01 02:00:00', '2018-01-01 03:00:00'], 'event' : ['A', 'B', 'C', 'D', 'E', 'F'] })
df1 = df1.sample(5)
df2[['start_time', 'end_time']] = df2.iloc[:,0:2].apply(pd.to_datetime)
You can use a couple of straightfoward list comprehensions to achieve your result. This answer assumes that all date columns are in fact, of type datetime
in your DataFrame:
Step 1
Find all events that occur within a particular time range using a list comprehension and simple interval checking:
packed = list(zip(df2.start_time, df2.end_time, df2.event))
df1['event'] = [[ev for strt, end, ev in packed if strt <= el <= end] for el in df1.time]
time value event
2 2018-01-01 00:20:00 8 [A, B]
14 2018-01-01 02:20:00 14 [F]
8 2018-01-01 01:20:00 6 [C, D, E]
19 2018-01-01 03:10:00 16 []
4 2018-01-01 00:40:00 7 [A, B]
Step 2:
Finally, explode each list from the last result to a new row using another list comprehension:
pd.DataFrame(
[[t, val, e] for t, val, event in zip(df1.time, df1.value, df1.event)
for e in event
], columns=df1.columns
)
Output:
time value event
0 2018-01-01 00:20:00 8 A
1 2018-01-01 00:20:00 8 B
2 2018-01-01 02:20:00 14 F
3 2018-01-01 01:20:00 6 C
4 2018-01-01 01:20:00 6 D
5 2018-01-01 01:20:00 6 E
6 2018-01-01 00:40:00 7 A
7 2018-01-01 00:40:00 7 B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With