Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resetting Cumulative Sum once a value is reached and set a flag to 1

I'm having trouble coming up with a way to perform a cumulative sum on a column and creating a flag once it hits a certain value.

So given a dataframe:

df = pd.DataFrame([[5,1],[6,1],[30,1],[170,0],[5,1],[10,1]],columns = ['a','b'])

     a  b
0    5  1
1    6  1
2   30  1
3  170  0
4    5  1
5   10  1

For column A, I want to perform the cumulative sum and set the "Flag" column value to 1 if that max value is reached. Upon reaching that max value, it will reset to 0. In this case, the max value is 40. Any cumulative sum over 40 will trigger the reset

Desired Output

     a  b  Flag
0    5  1     0
1   11  1     0
2   41  1     1
3  170  0     1
4    5  1     0
5   15  1     0

Any help would be appreciated!

like image 707
Jihong Avatar asked Oct 28 '25 08:10

Jihong


1 Answers

"Ordinary" cumsum() is here useless, as this function "doesn't know" where to restart summation.

You can do it with the following custom function:

def myCumSum(x, thr):
    if myCumSum.prev >= thr:
        myCumSum.prev = 0
    myCumSum.prev += x
    return myCumSum.prev

This function is "with memory" (from the previous call) - prev, so there is a way to "know" where to restart.

To speed up the execution, define a vectorized version of this function:

myCumSumV = np.vectorize(myCumSum, otypes=[np.int], excluded=['thr'])

Then execute:

threshold = 40
myCumSum.prev = 0  # Set the "previous" value
# Replace "a" column with your cumulative sum
df.a = myCumSumV(df.a.values, threshold)
df['flag'] = df.a.ge(threshold).astype(int)  # Compute "flag" column

The result is:

     a  b  flag
0    5  1     0
1   11  1     0
2   41  1     1
3  170  0     1
4    5  1     0
5   15  1     0
like image 50
Valdi_Bo Avatar answered Oct 31 '25 00:10

Valdi_Bo