How to add an extra column that is the cumulative value of the time differences for each course? For example, the initial table is:
id_A course weight ts_A value
id1 cotton 3.5 2017-04-27 01:35:30 150.000000
id1 cotton 3.5 2017-04-27 01:36:00 416.666667
id1 cotton 3.5 2017-04-27 01:36:30 700.000000
id1 cotton 3.5 2017-04-27 01:37:00 950.000000
id2 cotton blue 5.0 2017-04-27 02:35:30 150.000000
id2 cotton blue 5.0 2017-04-27 02:36:00 450.000000
id2 cotton blue 5.0 2017-04-27 02:36:30 520.666667
id2 cotton blue 5.0 2017-04-27 02:37:00 610.000000
The expected result is:
id_A course weight ts_A value cum_delta_sec
id1 cotton 3.5 2017-04-27 01:35:30 150.000000 0
id1 cotton 3.5 2017-04-27 01:36:00 416.666667 30
id1 cotton 3.5 2017-04-27 01:36:30 700.000000 60
id1 cotton 3.5 2017-04-27 01:37:00 950.000000 90
id2 cotton blue 5.0 2017-04-27 02:35:30 150.000000 0
id2 cotton blue 5.0 2017-04-27 02:36:00 450.000000 30
id2 cotton blue 5.0 2017-04-27 02:36:30 520.666667 60
id2 cotton blue 5.0 2017-04-27 02:37:00 610.000000 90
Cumulative or Span of Time is the most common way time is used in a measure. Traditionally Cumulative measures sum data across a span of time.
The cumsum() method returns a DataFrame with the cumulative sum for each row. The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.
You can chain the diff
method with cumsum
:
# convert ts_A to datetime type
df.ts_A = pd.to_datetime(df.ts_A)
# convert ts_A to seconds, group by id and then use transform to calculate the cumulative difference
df['cum_delta_sec'] = df.ts_A.astype(int).div(10**9).groupby(df.id_A).transform(lambda x: x.diff().fillna(0).cumsum())
df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With