How to groupby() a column and sum() iteratively per group

Question

If we consider the following dataframe df, which I already grouped by id and sorted by Date

| Date       |  id  |  col  | 
| 2011-10-10 | aaa  | 1100  | 
| 2010-10-10 | aaa  | 1000  | 
| 2017-10-10 | bbb  | 2300  | 
| 2012-02-02 | bbb  | 2200  | 
| 2010-02-02 | bbb  | 2100  | 
| 2009-02-02 | bbb  | 2000  |
| 2017-02-02 | ccc  | 3200  |   
| 2016-04-04 | ccc  | 3100  | 
| 2015-04-04 | ccc  | 3000  |

I can add the column col_sum easily by grouping and summing up per group

| Date       |  id  |  col  | col_sum  | 
| 2011-10-10 | aaa  | 1100  | 2100     | 
| 2010-10-10 | aaa  | 1000  | 1000     | 
| 2017-10-10 | bbb  | 2300  | 8600     | 
| 2012-02-02 | bbb  | 2200  | 8600     | 
| 2010-02-02 | bbb  | 2100  | 8600     | 
| 2009-02-02 | bbb  | 2000  | 8600     | 
| 2017-02-02 | ccc  | 3200  | 9300     |  
| 2016-04-04 | ccc  | 3100  | 9300     | 
| 2015-04-04 | ccc  | 3000  | 9300     |

but my goal is to calculate the column col_sum_prev such that I get

| Date       |  id  |  col  | col_sum  | col_sum_prev  |
| 2011-10-10 | aaa  | 1100  | 2100     | 2100          |
| 2010-10-10 | aaa  | 1000  | 1000     | 1000          |
| 2017-10-10 | bbb  | 2300  | 8600     | 8600          |
| 2012-02-02 | bbb  | 2200  | 8600     | 6300          |
| 2010-02-02 | bbb  | 2100  | 8600     | 4100          |
| 2009-02-02 | bbb  | 2000  | 8600     | 2000          |
| 2017-02-02 | ccc  | 3200  | 9300     | 9300          |   
| 2016-04-04 | ccc  | 3100  | 9300     | 6100          | 
| 2015-04-04 | ccc  | 3000  | 9300     | 3000          |

So for every group this column should contain the sum of the previous values of col in this id-group, where previous means, descending by Date.

The code to get to the second output is

df = pd.DataFrame({'Date': [date(2011,10,10), date(2010,10,10), date(2017,10,10), date(2012,2,2), date(2010,2,2), 
                        date(2009,2,2), date(2017,2,2), date(2016,4,4), date(2015,4,4)],
                   'id':  ['aaa', 'aaa', 'bbb', 'bbb', 'bbb', 'bbb', 'ccc', 'ccc', 'ccc'],
                   'col': [1100, 1000, 2300, 2200, 2100, 2000, 3200, 3100, 3000]})
df['col_sum'] = df.groupby(['id'])['col'].transform('sum')

Any Ideas?

jezrael · Accepted Answer

Use GroupBy.cumsum with swapped order of rows by DataFrame.iloc and [::-1]:

df['col_sum'] = df.groupby(['id'])['col'].transform('sum') 
df['col_sum_pre'] = df.iloc[::-1].groupby(['id'])['col'].cumsum()
print (df)
         Date   id   col  col_sum  col_sum_pre
0  2011-10-10  aaa  1100     2100         2100
1  2010-10-10  aaa  1000     2100         1000
2  2017-10-10  bbb  2300     8600         8600
3  2012-02-02  bbb  2200     8600         6300
4  2010-02-02  bbb  2100     8600         4100
5  2009-02-02  bbb  2000     8600         2000
6  2017-02-02  ccc  3200     9300         9300
7  2016-04-04  ccc  3100     9300         6100
8  2015-04-04  ccc  3000     9300         3000

How to groupby() a column and sum() iteratively per group

Tags:

python-3.x

pandas

dataframe

pandas-groupby

lesem

1 Answers

jezrael

Recent Activity

Donate For Us

How to groupby() a column and sum() iteratively per group

Tags:

python-3.x

pandas

dataframe

pandas-groupby

lesem

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us