Pandas groupby cumulative sum

People also ask

How do you do a cumulative sum in pandas?

The cumsum() method returns a DataFrame with the cumulative sum for each row. The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.

What is meant by cumulative sum?

Cumulative sums, or running totals, are used to display the total sum of data as it grows with time (or any other series or progression). This lets you view the total contribution so far of a given measure against time.

How do you count in Groupby pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.

This should do it, need groupby() twice:

df.groupby(['name', 'day']).sum() \
  .groupby(level=0).cumsum().reset_index()

Explanation:

print(df)
   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110

# sum per name/day
print( df.groupby(['name', 'day']).sum() )
                 no
name day           
Jack Monday      10
     Tuesday     30
     Wednesday   50
Jill Monday      40
      Wednesday  110

# cumulative sum per name/day
print( df.groupby(['name', 'day']).sum() \
         .groupby(level=0).cumsum() )
                 no
name day           
Jack Monday      10
     Tuesday     40
     Wednesday   90
Jill Monday      40
     Wednesday  150

The dataframe resulting from the first sum is indexed by 'name' and by 'day'. You can see it by printing

df.groupby(['name', 'day']).sum().index

When computing the cumulative sum, you want to do so by 'name', corresponding to the first index (level 0).

Finally, use reset_index to have the names repeated.

df.groupby(['name', 'day']).sum().groupby(level=0).cumsum().reset_index()

   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   40
2  Jack  Wednesday   90
3  Jill     Monday   40
4  Jill  Wednesday  150

Modification to @Dmitry's answer. This is simpler and works in pandas 0.19.0:

print(df) 

 name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110

df['no_csum'] = df.groupby(['name'])['no'].cumsum()

print(df)
   name        day   no  no_csum
0  Jack     Monday   10       10
1  Jack    Tuesday   20       30
2  Jack    Tuesday   10       40
3  Jack  Wednesday   50       90
4  Jill     Monday   40       40
5  Jill  Wednesday  110      150

This works in pandas 0.16.2

In[23]: print df
        name          day   no
0      Jack       Monday    10
1      Jack      Tuesday    20
2      Jack      Tuesday    10
3      Jack    Wednesday    50
4      Jill       Monday    40
5      Jill    Wednesday   110
In[24]: df['no_cumulative'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
In[25]: print df
        name          day   no  no_cumulative
0      Jack       Monday    10             10
1      Jack      Tuesday    20             30
2      Jack      Tuesday    10             40
3      Jack    Wednesday    50             90
4      Jill       Monday    40             40
5      Jill    Wednesday   110            150

you should use

df['cum_no'] = df.no.cumsum()

http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.cumsum.html

Another way of doing it

import pandas as pd
df = pd.DataFrame({'C1' : ['a','a','a','b','b'],
           'C2' : [1,2,3,4,5]})
df['cumsum'] = df.groupby(by=['C1'])['C2'].transform(lambda x: x.cumsum())
df

enter image description here

Instead of df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum() (see above) you could also do a df.set_index(['name', 'day']).groupby(level=0, as_index=False).cumsum()

df.groupby(by=['name','day']).sum() is actually just moving both columns to a MultiIndex
as_index=False means you do not need to call reset_index afterwards

data.csv:

name,day,no
Jack,Monday,10
Jack,Tuesday,20
Jack,Tuesday,10
Jack,Wednesday,50
Jill,Monday,40
Jill,Wednesday,110

Code:

import numpy as np
import pandas as pd

df = pd.read_csv('data.csv')
print(df)
df = df.groupby(['name', 'day'])['no'].sum().reset_index()
print(df)
df['cumsum'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
print(df)

Output:

   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110
   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   30
2  Jack  Wednesday   50
3  Jill     Monday   40
4  Jill  Wednesday  110
   name        day   no  cumsum
0  Jack     Monday   10      10
1  Jack    Tuesday   30      40
2  Jack  Wednesday   50      90
3  Jill     Monday   40      40
4  Jill  Wednesday  110     150

Related questions
                            
                                What is the difference between NaN and None?
                            
                                Is there a benefit to defining a class inside another class in Python?
                            
                                What's the purpose of tf.app.flags in TensorFlow?
                            
                                Have the same README both in Markdown and reStructuredText
                            
                                What is the difference between size and count in pandas?
                            
                                How do I install a pip package globally instead of locally?
                            
                                How to share x axes of two subplots after they have been created
                            
                                What's the fastest way of checking if a point is inside a polygon in python
                            
                                Django: multiple models in one template using forms [closed]
                            
                                Python Pandas equivalent in JavaScript
                            
                                Python: Select subset from list based on index set
                            
                                What is the Bash equivalent of Python's pass statement
                            
                                What is the point of uWSGI?
                            
                                sphinx-build fail - autodoc can't import/find module
                            
                                Calling filter returns <filter object at ... > [duplicate]
                            
                                plot with custom text for x axis points
                            
                                The modulo operation on negative numbers in Python
                            
                                Single Line Nested For Loops
                            
                                One-liner to check whether an iterator yields at least one element?
                            
                                How do you set your pythonpath in an already-created virtualenv?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas groupby cumulative sum

Tags:

python

pandas

pandas-groupby

People also ask

Recent Activity

Donate For Us