I have data in pandas dataframe with 1 minute time step. This data is not recorded continuously, now I would like to split all my data into separate event based on the condition below : If there is continuous data recorded for 5min or more then only it is considered as a event and for such event data need to extracted separately. Is there a way to implement this in pandas dataframe.
My data look likes this (with the Event column as result):
Date X Event
2017-06-06 01:08:00 0.019 1
2017-06-06 01:09:00 0.005 1
2017-06-06 01:10:00 0.03 1
2017-06-06 01:11:00 0.005 1
2017-06-06 01:12:00 0.003 1
2017-06-06 01:13:00 0.001 1
2017-06-06 01:14:00 0.039 1
2017-06-06 01:15:00 0.003 1
2017-06-06 01:17:00 0.001 nan
2017-06-06 01:25:00 0.006 nan
2017-06-06 01:26:00 0.006 nan
2017-06-06 01:27:00 0.032 nan
2017-06-06 01:29:00 0.013 2
2017-06-06 01:30:00 0.065 2
2017-06-06 01:31:00 0.013 2
2017-06-06 01:32:00 0.001 2
2017-06-06 01:33:00 0.02 2
2017-06-06 01:38:00 0.05 nan
2017-06-06 01:40:00 0.025 3
2017-06-06 01:41:00 0.01 3
2017-06-06 01:42:00 0.008 3
2017-06-06 01:43:00 0.009 3
2017-06-06 01:44:00 0.038 3
2017-06-06 01:45:00 0.038 3
Your suggestion is highly appreciated.
Using the solution provided by nnnmmm, result looks like this
2015-01-01 03:24:00 NaN
2015-01-01 04:59:00 NaN
2015-01-01 05:01:00 NaN
2015-01-01 05:02:00 NaN
2015-01-01 05:03:00 NaN
2015-01-13 01:12:00 1.0
2015-01-13 01:13:00 1.0
2015-01-13 01:14:00 1.0
2015-01-13 01:15:00 1.0
2015-01-13 01:16:00 1.0
2015-01-13 01:49:00 1.0
2015-01-13 01:50:00 1.0
2015-01-13 01:51:00 1.0
2015-01-13 01:52:00 1.0
2015-01-13 01:53:00 1.0
2015-01-13 01:54:00 1.0
2015-01-13 01:55:00 1.0
In this case, there is time change between 01:16:00 and 01:49:00, it shouldn't consider it as a same event rather 01:49:00 should be the second event.
This is a bit rough (and not very concise), but you could do something like the following (you could of course make it more concise and leave out the intermediate variables, but I left them in here to make it easier to understand what's happening).
df['new'] = (df.reset_index().Date.diff() == pd.Timedelta('1min')).astype(int).values
df['grp'] = (df.new != 1).cumsum()
df['cnt'] = df.groupby('grp')['new'].transform(size)
df['event'] = df['cnt'] > 4
df['Event'] = ((df.event) & (df.new != 1)).cumsum()
df['Event'] = np.where( df.event, df.Event, np.nan )
X new grp cnt event Event
Date
2017-06-06 01:08:00 0.019 0 1 8 True 1.0
2017-06-06 01:09:00 0.005 1 1 8 True 1.0
2017-06-06 01:10:00 0.030 1 1 8 True 1.0
2017-06-06 01:11:00 0.005 1 1 8 True 1.0
2017-06-06 01:12:00 0.003 1 1 8 True 1.0
2017-06-06 01:13:00 0.001 1 1 8 True 1.0
2017-06-06 01:14:00 0.039 1 1 8 True 1.0
2017-06-06 01:15:00 0.003 1 1 8 True 1.0
2017-06-06 01:17:00 0.001 0 2 1 False NaN
2017-06-06 01:25:00 0.006 0 3 3 False NaN
2017-06-06 01:26:00 0.006 1 3 3 False NaN
2017-06-06 01:27:00 0.032 1 3 3 False NaN
2017-06-06 01:29:00 0.013 0 4 5 True 2.0
2017-06-06 01:30:00 0.065 1 4 5 True 2.0
2017-06-06 01:31:00 0.013 1 4 5 True 2.0
2017-06-06 01:32:00 0.001 1 4 5 True 2.0
2017-06-06 01:33:00 0.020 1 4 5 True 2.0
2017-06-06 01:38:00 0.050 0 5 1 False NaN
2017-06-06 01:40:00 0.025 0 6 6 True 3.0
2017-06-06 01:41:00 0.010 1 6 6 True 3.0
2017-06-06 01:42:00 0.008 1 6 6 True 3.0
2017-06-06 01:43:00 0.009 1 6 6 True 3.0
2017-06-06 01:44:00 0.038 1 6 6 True 3.0
2017-06-06 01:45:00 0.038 1 6 6 True 3.0
There is probably a more elegant way to solve this, but this should do the trick:
# True where the previous timestamp is one minute away
prev_ok = pd.Series(df['Date'].diff().values == np.timedelta64(1, 'm'))
# True where the previous four rows of prev_ok are True
a = prev_ok.rolling(4).sum() == 4
# extend True back down the previous four rows, this could be done with a loop
b = a | a.shift(-1) | a.shift(-2) | a.shift(-3) | a.shift(-4)
# calculate edges from False to True to get the event indices
c = (~a.shift(-3).fillna(False) & a.shift(-4)).cumsum()
# only display event indices where b is
df['Event'] = c.mask(~b)
The output is
Date X Event
0 2017-06-06 01:08:00 0.019 1.0
1 2017-06-06 01:09:00 0.005 1.0
2 2017-06-06 01:10:00 0.030 1.0
3 2017-06-06 01:11:00 0.005 1.0
4 2017-06-06 01:12:00 0.003 1.0
5 2017-06-06 01:13:00 0.001 1.0
6 2017-06-06 01:14:00 0.039 1.0
7 2017-06-06 01:15:00 0.003 1.0
8 2017-06-06 01:17:00 0.001 NaN
9 2017-06-06 01:25:00 0.006 NaN
10 2017-06-06 01:26:00 0.006 NaN
11 2017-06-06 01:27:00 0.032 NaN
12 2017-06-06 01:29:00 0.013 2.0
13 2017-06-06 01:30:00 0.065 2.0
14 2017-06-06 01:31:00 0.013 2.0
15 2017-06-06 01:32:00 0.001 2.0
16 2017-06-06 01:33:00 0.020 2.0
17 2017-06-06 01:38:00 0.050 NaN
18 2017-06-06 01:40:00 0.025 NaN
19 2017-06-06 01:42:00 0.010 NaN
20 2017-06-06 01:43:00 0.008 NaN
21 2017-06-06 01:44:00 0.009 NaN
22 2017-06-06 01:45:00 0.038 NaN
Fun fact: The calculation of a
is basically an erosion of a 1-D image with a length 4 structuring element, and the calculation of b is a dilation with the same structuring element. Taken together, the computation of b
from prev_ok
is an opening, i.e. what was True
in prev_ok
is True
in b
only if that True
is part of a consecutive group of five True
s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With