Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas measure elapsed time since a condition

I have the following dataframe:

               Time   Work
2018-12-01 10:00:00     Off
2018-12-01 10:00:02     On
2018-12-01 10:00:05     On
2018-12-01 10:00:06     On
2018-12-01 10:00:07     On
2018-12-01 10:00:09    Off
2018-12-01 10:00:11    Off
2018-12-01 10:00:14     On
2018-12-01 10:00:16     On
2018-12-01 10:00:18     On
2018-12-01 10:00:20    Off

I would like to creat a new column with the elapsed time since the device started working.

               Time   Work    Elapsed Time
2018-12-01 10:00:00    Off               0
2018-12-01 10:00:02     On               2
2018-12-01 10:00:05     On               5
2018-12-01 10:00:06     On               6
2018-12-01 10:00:07     On               7
2018-12-01 10:00:09    Off               0
2018-12-01 10:00:11    Off               0
2018-12-01 10:00:14     On               3
2018-12-01 10:00:16     On               5
2018-12-01 10:00:18     On               7
2018-12-01 10:00:20    Off               0

How can I do it?

like image 737
Rafael Avatar asked Dec 06 '18 17:12

Rafael


2 Answers

You can use groupby:

# df['Time'] = pd.to_datetime(df['Time'], errors='coerce') # Uncomment if needed.
sec = df['Time'].dt.second
df['Elapsed Time'] = (
    sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))

df
                  Time Work  Elapsed Time
0  2018-12-01 10:00:00  Off             0
1  2018-12-01 10:00:02   On             2
2  2018-12-01 10:00:05   On             5
3  2018-12-01 10:00:06   On             6
4  2018-12-01 10:00:07   On             7
5  2018-12-01 10:00:09  Off             0
6  2018-12-01 10:00:11  Off             0
7  2018-12-01 10:00:14   On             3
8  2018-12-01 10:00:16   On             5
9  2018-12-01 10:00:18   On             7
10 2018-12-01 10:00:20  Off             0

The idea is to extract the seconds portion and subtract the elapsed time from the first moment the state changes from "Off" to "On". This is done using transform and first.

cumsum is used to identify groups:

df.Work.eq('Off').cumsum()

0     1
1     1
2     1
3     1
4     1
5     2
6     3
7     3
8     3
9     3
10    4
Name: Work, dtype: int64

If there's a possibility your device can span multiple minutes while in the "On", then, initialise sec as:

sec = df['Time'].values.astype(np.int64) // 10e8

df['Elapsed Time'] = (
    sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))

df
                  Time Work  Elapsed Time
0  2018-12-01 10:00:00  Off           0.0
1  2018-12-01 10:00:02   On           2.0
2  2018-12-01 10:00:05   On           5.0
3  2018-12-01 10:00:06   On           6.0
4  2018-12-01 10:00:07   On           7.0
5  2018-12-01 10:00:09  Off           0.0
6  2018-12-01 10:00:11  Off           0.0
7  2018-12-01 10:00:14   On           3.0
8  2018-12-01 10:00:16   On           5.0
9  2018-12-01 10:00:18   On           7.0
10 2018-12-01 10:00:20  Off           0.0
like image 86
cs95 Avatar answered Oct 11 '22 04:10

cs95


IIUC first with transform

(df.Time-df.Time.groupby(df.Work.eq('Off').cumsum()).transform('first')).dt.seconds
Out[1090]: 
0     0
1     2
2     5
3     6
4     7
5     0
6     0
7     3
8     5
9     7
10    0
Name: Time, dtype: int64
like image 25
BENY Avatar answered Oct 11 '22 03:10

BENY