Reset Cumulative sum base on condition Pandas

Tags:

I have a data frame like:

customer spend hurdle 
A         20    50      
A         31    50      
A         20    50      
B         50    100     
B         51    100    
B         30    100

I want to calculate additional column for Cumulative which will reset base on the same customer when the Cumulative sum greater or equal to the hurdle like following :

customer spend hurdle Cumulative 
A         20    50      20
A         31    50      51
A         20    50      20
B         50    100     50
B         51    100    101
B         30    100     30

I used the cumsum and groupby in pandas to but I do not know how to reset it base on the condition.

Following are the code I am currently using:

df1['cum_sum'] = df1.groupby(['customer'])['spend'].apply(lambda x: x.cumsum())

which I know it is just a normal cumulative sum. I very appreciate for your help.

929

asked Oct 17 '17 07:10

user2741956

2 Answers

There could be faster, efficient way. Here's one inefficient apply way to do would be.

In [3270]: def custcum(x):
      ...:     total = 0
      ...:     for i, v in x.iterrows():
      ...:         total += v.spend
      ...:         x.loc[i, 'cum'] = total
      ...:         if total >= v.hurdle:
      ...:            total = 0
      ...:     return x
      ...:

In [3271]: df.groupby('customer').apply(custcum)
Out[3271]:
  customer  spend  hurdle    cum
0        A     20      50   20.0
1        A     31      50   51.0
2        A     20      50   20.0
3        B     50     100   50.0
4        B     51     100  101.0
5        B     30     100   30.0

You may consider using cython or numba to speed up the custcum

[Update]

Improved version of Ido s answer.

In [3276]: s = df.groupby('customer').spend.cumsum()

In [3277]: np.where(s > df.hurdle.shift(-1), s, df.spend)
Out[3277]: array([ 20,  51,  20,  50, 101,  30], dtype=int64)

answered Oct 31 '22 21:10

Zero

One way would be the below code. But it's a really inefficient and inelegant one-liner.

df1.groupby('customer').apply(lambda x: (x['spend'].cumsum() *(x['spend'].cumsum() > x['hurdle']).astype(int).shift(-1)).fillna(x['spend']))

answered Oct 31 '22 21:10

Ido S

Related questions
                            
                                Is it possible to have non-trainable layer in Keras?
                            
                                How to check if date is between two dates in python
                            
                                stdout redirect from Jupyter notebook is landing in the terminal
                            
                                get original key set from defaultdict
                            
                                Python can't open file "No such file or directory"
                            
                                Kivy: Access configuration values from any widget
                            
                                xarray automatically applying _FillValue to coordinates on netCDF output
                            
                                Importing python module in R
                            
                                binascii.Error: Incorrect padding, even when string length is multiple of 4
                            
                                How to correctly write a raw multiline string in Python?
                            
                                Spline interpolation in 3D in python
                            
                                How to debug underlying C++ library from Python interface?
                            
                                conftest.py ImportError: No module named Foo
                            
                                string.rstrip() is removing extra characters
                            
                                using counter with nested dictionaries in python
                            
                                WebSocket connection between reactjs Client and flask-socketio Server doesn't open
                            
                                How to create a bag of words from a pandas dataframe
                            
                                How do I move a Python 3.6 installation a different directory?
                            
                                Multiplying a column in a Spark dataframe by a constant value
                            
                                Compare strings in numba-compiled function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reset Cumulative sum base on condition Pandas

Tags:

python

pandas

dataframe

cumsum

user2741956

People also ask

2 Answers

Zero

Ido S

Recent Activity

Donate For Us