Perfrom cumulative sum over a column but reset to 0 if sum become negative in Pandas

Question

I have a pandas dataframe with two columns like this,

Item    Value
0   A   7
1   A   2
2   A   -6
3   A   -70
4   A   8
5   A   0

I want to cumulative sum over the column, Value. But while creating the cumulative sum if the value becomes negative I want to reset it back to 0.

I am currently using a loop shown below to perform this,

sum_ = 0
cumsum = []

for val in sample['Value'].values:
    sum_ += val
    if sum_ < 0:
        sum_ = 0
    cumsum.append(sum_)

print(cumsum) # [7, 9, 3, 0, 8, 8]

I am looking for a more efficient way to perform this in pure pandas.

BENY · Accepted Answer

Slightly modify also this method is slow that numba solution

sumlm = np.frompyfunc(lambda a,b: 0 if a+b < 0 else a+b,2,1)
newx=sumlm.accumulate(df.Value.values, dtype=np.object)
newx
Out[147]: array([7, 9, 3, 0, 8, 8], dtype=object)

numba solution

from numba import njit
@njit
def cumli(x, lim):
    total = 0
    result = []
    for i, y in enumerate(x):
        total += y
        if total < lim:
            total = 0
        result.append(total)
    return result
cumli(df.Value.values,0)
Out[166]: [7, 9, 3, 0, 8, 8]

max9111 · Answer

This is only a comment WeNYoBen.

If you can avoid lists it is usually recommendable to avoid it.

Example

from numba import njit
import numpy as np

#with lists
@njit()
def cumli(x, lim):
    total = 0
    result = []
    for i, y in enumerate(x):
        total += y
        if total < lim:
            total = 0
        result.append(total)
    return result

#without lists
@njit()
def cumli_2(x, lim):
    total = 0.
    result = np.empty_like(x)
    for i, y in enumerate(x):
        total += y
        if total < lim:
            total = 0.
        result[i]=total
    return result

Timings

Without Numba (comment out@njit()):

x=(np.random.rand(1_000)-0.5)*5

  %timeit a=cumli(x, 0.)
  220 µs ± 2.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  %timeit a=cumli_2(x, 0.)
  227 µs ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

There is no difference between using lists or arrays. But that's not the case if you Jit-compile this function.

With Numba:

  %timeit a=cumli(x, 0.)
  27.4 µs ± 210 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  %timeit a=cumli_2(x, 0.)
  2.96 µs ± 32.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Even in a bit more complicated cases (final array size unknown, or only max array size known) it often makes sense to allocate an array and shrink it at the end, or in simple cases even to run the algorithm once to know the final array size and than do the real calculation.

Perfrom cumulative sum over a column but reset to 0 if sum become negative in Pandas

Tags:

python

pandas

Sreeram TP

2 Answers

BENY

max9111

Recent Activity

Donate For Us

Perfrom cumulative sum over a column but reset to 0 if sum become negative in Pandas

Tags:

python

pandas

Sreeram TP

2 Answers

BENY

max9111

Related questions

Recent Activity

Donate For Us