Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Python Groupby Cummulative Sum Reverse

I have found Pandas groupby cumulative sum and found it very useful. However, I would like to determine how to calculate a reverse cumulative sum.

The link suggests the following.

df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()

In order to reverse sum, I tried slicing the data, but it fails.

df.groupby(by=['name','day']).ix[::-1, 'no'].sum().groupby(level=[0]).cumsum()


Jack | Monday    | 10 | 90
Jack | Tuesday   | 30 | 80
Jack | Wednesday | 50 | 50
Jill | Monday    | 40 | 80
Jill | Wednesday | 40 | 40 

EDIT: Based on feedback, I tried to implement the code and make the dataframe larger:

import pandas as pd
df = pd.DataFrame(
    {'name': ['Jack', 'Jack', 'Jack', 'Jill', 'Jill'],
     'surname' : ['Jones','Jones','Jones','Smith','Smith'],
     'car' : ['VW','Mazda','VW','Merc','Merc'],
     'country' : ['UK','US','UK','EU','EU'],
     'year' : [1980,1980,1980,1980,1980],
     'day': ['Monday', 'Tuesday','Wednesday','Monday','Wednesday'],
     'date': ['2016-02-31','2016-01-31','2016-01-31','2016-01-31','2016-01-31'],
     'no': [10,30,50,40,40],
     'qty' : [100,500,200,433,222]})

I then try and group on a number of columns but it fails to apply the grouping.

df = df.groupby(by=['name','surname','car','country','year','day','date']).sum().iloc[::-1].groupby(level=[0]).cumsum().iloc[::-1].reset_index()

Why is the case? I expect Jack Jones with car Mazda to be a separate cumulative quantity from Jack Jones with a VW.

like image 685
Travis Avatar asked Sep 19 '17 14:09

Travis


People also ask

How do you do cumulative sum in pandas?

Pandas Series: cumsum() functionThe cumsum() function is used to get cumulative sum over a DataFrame or Series axis. Returns a DataFrame or Series of the same size containing the cumulative sum. The index or the name of the axis. 0 is equivalent to None or 'index'.

Does pandas Groupby ignore NaN?

From the docs: "NA groups in GroupBy are automatically excluded".

What does Groupby sum return?

groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

Does pandas Groupby preserve order?

Groupby preserves the order of rows within each group.


1 Answers

You can use double iloc:

df = df.groupby(by=['name','day']).sum().iloc[::-1].groupby(level=[0]).cumsum().iloc[::-1]
print (df)
                no
name day          
Jack Monday     90
     Tuesday    80
     Wednesday  50
Jill Monday     80
     Wednesday  40

For another column solution is simplify:

df = df.groupby(by=['name','day']).sum()
df['new'] = df.iloc[::-1].groupby(level=[0]).cumsum()
print (df)
                no  new
name day               
Jack Monday     10   90
     Tuesday    30   80
     Wednesday  50   50
Jill Monday     40   80
     Wednesday  40   40

EDIT:

There is problem in second groupby need to append more levels - level=[0,1,2] means group by first name, second surname and third car levels.

df1 = (df.groupby(by=['name','surname','car','country','year','day','date'])
        .sum())
print (df1)
                                                      no  qty
name surname car   country year day       date               
Jack Jones   Mazda US      1980 Tuesday   2016-01-31  30  500
             VW    UK      1980 Monday    2016-02-31  10  100
                                Wednesday 2016-01-31  50  200
Jill Smith   Merc  EU      1980 Monday    2016-01-31  40  433
                                Wednesday 2016-01-31  40  222

df2 = (df.groupby(by=['name','surname','car','country','year','day','date'])
        .sum()
        .iloc[::-1]
        .groupby(level=[0,1,2])
        .cumsum()
        .iloc[::-1]
        .reset_index())
print (df2)
   name surname    car country  year        day        date  no  qty
0  Jack   Jones  Mazda      US  1980    Tuesday  2016-01-31  30  500
1  Jack   Jones     VW      UK  1980     Monday  2016-02-31  60  300
2  Jack   Jones     VW      UK  1980  Wednesday  2016-01-31  50  200
3  Jill   Smith   Merc      EU  1980     Monday  2016-01-31  80  655
4  Jill   Smith   Merc      EU  1980  Wednesday  2016-01-31  40  222

Or is possible select by names - see groupby enhancements in 0.20.1+:

df2 = (df.groupby(by=['name','surname','car','country','year','day','date'])
        .sum()
        .iloc[::-1]
        .groupby(['name','surname','car'])
        .cumsum()
        .iloc[::-1]
        .reset_index())
print (df2)

   name surname    car country  year        day        date  no  qty
0  Jack   Jones  Mazda      US  1980    Tuesday  2016-01-31  30  500
1  Jack   Jones     VW      UK  1980     Monday  2016-02-31  60  300
2  Jack   Jones     VW      UK  1980  Wednesday  2016-01-31  50  200
3  Jill   Smith   Merc      EU  1980     Monday  2016-01-31  80  655
4  Jill   Smith   Merc      EU  1980  Wednesday  2016-01-31  40  222
like image 97
jezrael Avatar answered Sep 16 '22 22:09

jezrael