Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to groupby() a column and sum() iteratively per group

If we consider the following dataframe df, which I already grouped by id and sorted by Date

| Date       |  id  |  col  | 
| 2011-10-10 | aaa  | 1100  | 
| 2010-10-10 | aaa  | 1000  | 
| 2017-10-10 | bbb  | 2300  | 
| 2012-02-02 | bbb  | 2200  | 
| 2010-02-02 | bbb  | 2100  | 
| 2009-02-02 | bbb  | 2000  |
| 2017-02-02 | ccc  | 3200  |   
| 2016-04-04 | ccc  | 3100  | 
| 2015-04-04 | ccc  | 3000  | 

I can add the column col_sum easily by grouping and summing up per group

| Date       |  id  |  col  | col_sum  | 
| 2011-10-10 | aaa  | 1100  | 2100     | 
| 2010-10-10 | aaa  | 1000  | 1000     | 
| 2017-10-10 | bbb  | 2300  | 8600     | 
| 2012-02-02 | bbb  | 2200  | 8600     | 
| 2010-02-02 | bbb  | 2100  | 8600     | 
| 2009-02-02 | bbb  | 2000  | 8600     | 
| 2017-02-02 | ccc  | 3200  | 9300     |  
| 2016-04-04 | ccc  | 3100  | 9300     | 
| 2015-04-04 | ccc  | 3000  | 9300     |

but my goal is to calculate the column col_sum_prev such that I get

| Date       |  id  |  col  | col_sum  | col_sum_prev  |
| 2011-10-10 | aaa  | 1100  | 2100     | 2100          |
| 2010-10-10 | aaa  | 1000  | 1000     | 1000          |
| 2017-10-10 | bbb  | 2300  | 8600     | 8600          |
| 2012-02-02 | bbb  | 2200  | 8600     | 6300          |
| 2010-02-02 | bbb  | 2100  | 8600     | 4100          |
| 2009-02-02 | bbb  | 2000  | 8600     | 2000          |
| 2017-02-02 | ccc  | 3200  | 9300     | 9300          |   
| 2016-04-04 | ccc  | 3100  | 9300     | 6100          | 
| 2015-04-04 | ccc  | 3000  | 9300     | 3000          | 

So for every group this column should contain the sum of the previous values of col in this id-group, where previous means, descending by Date.

The code to get to the second output is

df = pd.DataFrame({'Date': [date(2011,10,10), date(2010,10,10), date(2017,10,10), date(2012,2,2), date(2010,2,2), 
                        date(2009,2,2), date(2017,2,2), date(2016,4,4), date(2015,4,4)],
                   'id':  ['aaa', 'aaa', 'bbb', 'bbb', 'bbb', 'bbb', 'ccc', 'ccc', 'ccc'],
                   'col': [1100, 1000, 2300, 2200, 2100, 2000, 3200, 3100, 3000]})
df['col_sum'] = df.groupby(['id'])['col'].transform('sum') 

Any Ideas?

like image 975
lesem Avatar asked Nov 21 '25 06:11

lesem


1 Answers

Use GroupBy.cumsum with swapped order of rows by DataFrame.iloc and [::-1]:

df['col_sum'] = df.groupby(['id'])['col'].transform('sum') 
df['col_sum_pre'] = df.iloc[::-1].groupby(['id'])['col'].cumsum()
print (df)
         Date   id   col  col_sum  col_sum_pre
0  2011-10-10  aaa  1100     2100         2100
1  2010-10-10  aaa  1000     2100         1000
2  2017-10-10  bbb  2300     8600         8600
3  2012-02-02  bbb  2200     8600         6300
4  2010-02-02  bbb  2100     8600         4100
5  2009-02-02  bbb  2000     8600         2000
6  2017-02-02  ccc  3200     9300         9300
7  2016-04-04  ccc  3100     9300         6100
8  2015-04-04  ccc  3000     9300         3000
like image 54
jezrael Avatar answered Nov 23 '25 19:11

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!