Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas cumsum() reset after hitting max

I have a pandas DataFrame with timedeltas as a cumulative sum of those deltas in a separate column expressed in milliseconds. An example is provided below:

Transaction_ID  Time            TimeDelta       CumSum[ms]
1              00:00:04.500     00:00:00.000    000
2              00:00:04.600     00:00:00.100    100
3              00:00:04.762     00:00:00.162    262
4              00:00:05.543     00:00:00.781    1043
5              00:00:09.567     00:00:04.024    5067
6              00:00:10.654     00:00:01.087    6154
7              00:00:14.300     00:00:03.646    9800
8              00:00:14.532     00:00:00.232    10032
9              00:00:16.500     00:00:01.968    12000
10             00:00:17.543     00:00:01.043    13043

I would like to be able to provide a maximum value for CumSum[ms] after which the cumulative sum would start over again at 0. For example, if the maximum value was 3000 in the above example, the results would look like so:

Transaction_ID  Time            TimeDelta       CumSum[ms]
1              00:00:04.500     00:00:00.000    000
2              00:00:04.600     00:00:00.100    100
3              00:00:04.762     00:00:00.162    262
4              00:00:05.543     00:00:00.781    1043
5              00:00:09.567     00:00:04.024    0
6              00:00:10.654     00:00:01.087    1087
7              00:00:14.300     00:00:03.646    0
8              00:00:14.532     00:00:00.232    232
9              00:00:16.500     00:00:01.968    2200
10             00:00:17.543     00:00:01.043    0

I have explored using the modulo operator, but am only successful in resetting back to zero when the resulting cumsum is equal to the limit provided (i.e. cumsum[ms] of 500 % 500 equals zero).

Thanks in advance for any thoughts you may have, and please let me know if I can provide any more information.

like image 238
wrcobb Avatar asked Nov 25 '14 17:11

wrcobb


1 Answers

Here's an example of how you might do this by iterating over each row in the dataframe. I created new data for the example for simplicity:

df = pd.DataFrame({'TimeDelta': np.random.normal( 900, 60, size=100)})
print df.head()
    TimeDelta
0  971.021295
1  734.359861
2  867.000397
3  992.166539
4  853.281131

So let's do an accumulator loop with your desired 3000 max:

maxvalue = 3000

lastvalue = 0
newcum = []
for row in df.iterrows():
    thisvalue =  row[1]['TimeDelta'] + lastvalue
    if thisvalue > maxvalue:
        thisvalue = 0
    newcum.append( thisvalue )
    lastvalue = thisvalue

Then put the newcom list into the dataframe:

df['newcum'] = newcum
print df.head()
    TimeDelta       newcum
0  801.977678   801.977678
1  893.296429  1695.274107
2  935.303566  2630.577673
3  850.719497     0.000000
4  951.554206   951.554206
like image 84
JD Long Avatar answered Sep 20 '22 01:09

JD Long