I have spent a few hours now trying to do a "cumulative group by sum" on a pandas dataframe. I have looked at all the stackoverflow answers and surprisingly none of them can solve my (very elementary) problem: I have a dataframe: <code>df1 Out[8]: Name Date Amount 0 Jack 2016-01-31 10 1 Jack 2016-02-29 5 2 Jack 2016-02-29 8 3 Jill 2016-01-31 10 4 Jill 2016-02-29 5</code> I am trying to <ol> <li>group by ['Name','Date'] and </li> <li>cumsum 'Amount'.</li> <li>That is it.</li> </ol> So the desired output is: <code>df1 Out[10]: Name Date Cumsum 0 Jack 2016-01-31 10 1 Jack 2016-02-29 23 2 Jill 2016-01-31 10 3 Jill 2016-02-29 15</code> EDIT: I am simplifying the question. With the current answers I still can't get the correct "running" cumsum. Look closely, I want to see the cumulative sum "10, 23, 10, 15". In words, I want to see, at every consecutive date, the total cumulative sum for a person. NB: If there are two entries on one date for the same person, I want to sum those and then add them to the running cumsum and only then print the sum.

Set the index first, then groupby. <pre class="prettyprint"><code>df.set_index(['Name', 'Date']).groupby(level=[0, 1]).Amount.cumsum().reset_index() </code></pre> <img src="https://i.stack.imgur.com/3HsP3.png" alt="enter image description here"> <hr> After the OP changed their question, this is now the correct answer. <pre class="prettyprint"><code>df1.groupby( ['Name','Date'] )Amount.sum().groupby( level='Name' ).cumsum() </code></pre> This is the same answer provided by jezrael

Pandas group by cumsum keep columns

Tags:

pandas

group-by

cumsum

I have spent a few hours now trying to do a "cumulative group by sum" on a pandas dataframe. I have looked at all the stackoverflow answers and surprisingly none of them can solve my (very elementary) problem:

I have a dataframe:

df1 Out[8]: Name Date Amount 0 Jack 2016-01-31 10 1 Jack 2016-02-29 5 2 Jack 2016-02-29 8 3 Jill 2016-01-31 10 4 Jill 2016-02-29 5

I am trying to

group by ['Name','Date'] and
cumsum 'Amount'.
That is it.

So the desired output is:

df1 Out[10]: Name Date Cumsum 0 Jack 2016-01-31 10 1 Jack 2016-02-29 23 2 Jill 2016-01-31 10 3 Jill 2016-02-29 15

EDIT: I am simplifying the question. With the current answers I still can't get the correct "running" cumsum. Look closely, I want to see the cumulative sum "10, 23, 10, 15". In words, I want to see, at every consecutive date, the total cumulative sum for a person. NB: If there are two entries on one date for the same person, I want to sum those and then add them to the running cumsum and only then print the sum.

933

asked Jan 23 '17 14:01

gmarais

2 Answers

You need assign output to new column and then remove Amount column by drop:

df1['Cumsum'] = df1.groupby(by=['Name','Date'])['Amount'].cumsum()
df1 = df1.drop('Amount', axis=1)
print (df1)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29       5
2  Jack  2016-02-29      13
3  Jill  2016-01-31      10
4  Jill  2016-02-29       5

Another solution with assign:

df1 = df1.assign(Cumsum=df1.groupby(by=['Name','Date'])['Amount'].cumsum())
         .drop('Amount', axis=1)
print (df1)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29       5
2  Jack  2016-02-29      13
3  Jill  2016-01-31      10
4  Jill  2016-02-29       5

EDIT by comment:

First groupby columns Name and Date and aggregate sum, then groupby by level Name and aggregate cumsum.

df = df1.groupby(by=['Name','Date'])['Amount'].sum()
        .groupby(level='Name').cumsum().reset_index(name='Cumsum')
print (df)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29      23
2  Jill  2016-01-31      10
3  Jill  2016-02-29      15

answered Sep 27 '22 20:09

jezrael

Set the index first, then groupby.

df.set_index(['Name', 'Date']).groupby(level=[0, 1]).Amount.cumsum().reset_index()

enter image description here

After the OP changed their question, this is now the correct answer.

df1.groupby(
    ['Name','Date']
)Amount.sum().groupby(
    level='Name'
).cumsum()

This is the same answer provided by jezrael

answered Sep 27 '22 19:09

piRSquared

Related questions
                            
                                Pandas groupby custom function to each series
                            
                                How can I compute the absolute sum with a groupby in pandas?
                            
                                How to create a calendar table (date dimension) in pandas
                            
                                drop unused categories using groupby on categorical variable in pandas
                            
                                Remove duplicates from rows and columns (cell) in a dataframe, python
                            
                                Python Pandas - How to write in a specific column in an Excel Sheet
                            
                                Find first non-zero value in each column of pandas DataFrame
                            
                                How to remove strings present in a list from a column in pandas
                            
                                Create a dataframe from arrays python
                            
                                Different groupers for each column with pandas GroupBy
                            
                                Pandas change order of columns in pivot table
                            
                                PyCharm type checker expected type 'None', got 'str' instead when using pandas dataframe.to_csv
                            
                                Summing over months with pandas
                            
                                Pandas: Impute NaN's
                            
                                Pandas seems to ignore first column name when reading tab-delimited data, gives KeyError
                            
                                sum values of columns starting with the same string in pandas dataframe
                            
                                Save pandas dataframe but conserving NA values
                            
                                List index out of range with Panda read_csv
                            
                                Remove special characters in pandas dataframe
                            
                                How to read data in Python dataframe without concatenating?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With