Reversed cumulative sum of a column in pandas.DataFrame

Tags:

I've got a pandas DataFrame with a boolean column sorted by another column and need to calculate reverse cumulative sum of the boolean column, that is, amount of true values from current row to bottom.

Example

In [13]: df = pd.DataFrame({'A': [True] * 3 + [False] * 5, 'B': np.random.rand(8) })  In [15]: df = df.sort_values('B')  In [16]: df Out[16]:        A         B 6  False  0.037710 2   True  0.315414 4  False  0.332480 7  False  0.445505 3  False  0.580156 1   True  0.741551 5  False  0.796944 0   True  0.817563

I need something that will give me a new column with values

3 3 2 2 2 2 1 1

That is, for each row it should contain amount of True values on this row and rows below.

I've tried various methods using .iloc[::-1] but result is not that is desired.

It looks like I'm missing some obvious bit of information. I've starting using Pandas only yesterday.

528

asked Jun 17 '16 03:06

wl2776

2 Answers

Reverse column A, take the cumsum, then reverse again:

df['C'] = df.loc[::-1, 'A'].cumsum()[::-1]

import pandas as pd df = pd.DataFrame(     {'A': [False, True, False, False, False, True, False, True],      'B': [0.03771, 0.315414, 0.33248, 0.445505, 0.580156, 0.741551, 0.796944, 0.817563],},      index=[6, 2, 4, 7, 3, 1, 5, 0]) df['C'] = df.loc[::-1, 'A'].cumsum()[::-1] print(df)

yields

       A         B  C 6  False  0.037710  3 2   True  0.315414  3 4  False  0.332480  2 7  False  0.445505  2 3  False  0.580156  2 1   True  0.741551  2 5  False  0.796944  1 0   True  0.817563  1

Alternatively, you could count the number of Trues in column A and subtract the (shifted) cumsum:

In [113]: df['A'].sum()-df['A'].shift(1).fillna(0).cumsum() Out[113]:  6    3 2    3 4    2 7    2 3    2 1    2 5    1 0    1 Name: A, dtype: object

But this is significantly slower. Using IPython to perform the benchmark:

In [116]: df = pd.DataFrame({'A':np.random.randint(2, size=10**5).astype(bool)})  In [117]: %timeit df['A'].sum()-df['A'].shift(1).fillna(0).cumsum() 10 loops, best of 3: 19.8 ms per loop  In [118]: %timeit df.loc[::-1, 'A'].cumsum()[::-1] 1000 loops, best of 3: 701 µs per loop

190

answered Sep 21 '22 09:09

unutbu

Similar to unutbus first suggestion, but without the deprecated ix:

df['C']=df.A[::-1].cumsum()

answered Sep 23 '22 09:09

Ichta

Related questions
                            
                                How to merge branches in 2 different repositories?
                            
                                Installing R on Linux: configure: error: libcurl >= 7.28.0 library and headers are required with support for https
                            
                                Conflict on bitbucket remote server but everything is up to date locally
                            
                                Can't bind to 'routerLink' since it isn't a known property of 'a' [duplicate]
                            
                                How do I import the new logging functions in iOS 10 / Sierra / Xcode 8?
                            
                                Angular-2 : Change favicon icon as per configuration
                            
                                "Load" event on script with async and/or defer
                            
                                Passing a lambda into a function template
                            
                                How to call component method from service? (angular2)
                            
                                Closing connection when using Dapper
                            
                                Start celery worker throws "no attribute 'worker_state_db'"
                            
                                How to get Form Data in Angular 2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With