If I have a dataframe in pandas like:
StartDate, EndDate, uniqueid
2015-07-20 15:04:13, 2015-07-20 17:56:00, 1
2015-07-20 20:32:16, 2015-07-20 20:56:11, 3
2015-07-20 22:35:59, 2015-07-20 22:51:10, 11
2015-07-20 18:00:51, 2015-07-20 18:42:01, 12
I want to create a new column which has in indicator value if in the list there exists a StartDate less than an hour after the rows EndDate. So an example output would look like:
StartDate, EndDate, uniqueid, WithinHour
2015-07-20 15:04:13, 2015-07-20 17:56:00, 1, 1
2015-07-20 20:32:16, 2015-07-20 20:56:11, 3, 0
2015-07-20 22:35:59, 2015-07-20 22:51:10, 11, 0
2015-07-20 18:00:51, 2015-07-20 18:42:01, 12, 0
Because row 1 has
EndDate 2015-07-20 17:56:00
and row 4 has
StartDate 2015-07-20 18:00:51
which is within 1 hour.
There should not be any overlap in start and end date for the rows, so I think I should order by startdate, create a new column with the (StartDate of the next row) - (EndDate of this row), then change the value of the new column to 1 if the time is less than an hour and 0 if it's greater. I'm not sure how to do this though.
Here is one way. For each EndDate, apply the datetime range test you want. Use any to get the scalar answer as your value will be True if at least one row matches:
df['uniqueid'] = df.EndDate.apply(lambda dt: (((df.StartDate-pd.Timedelta(1, 'h')) <= dt)
& (df.StartDate >= dt )).any())
df.uniqueid
Out[57]:
0 True
1 False
2 False
3 False
Name: uniqueid, dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With