If we consider the following dataframe df, which I already grouped by id and sorted by Date
| Date | id | col |
| 2011-10-10 | aaa | 1100 |
| 2010-10-10 | aaa | 1000 |
| 2017-10-10 | bbb | 2300 |
| 2012-02-02 | bbb | 2200 |
| 2010-02-02 | bbb | 2100 |
| 2009-02-02 | bbb | 2000 |
| 2017-02-02 | ccc | 3200 |
| 2016-04-04 | ccc | 3100 |
| 2015-04-04 | ccc | 3000 |
I can add the column col_sum easily by grouping and summing up per group
| Date | id | col | col_sum |
| 2011-10-10 | aaa | 1100 | 2100 |
| 2010-10-10 | aaa | 1000 | 1000 |
| 2017-10-10 | bbb | 2300 | 8600 |
| 2012-02-02 | bbb | 2200 | 8600 |
| 2010-02-02 | bbb | 2100 | 8600 |
| 2009-02-02 | bbb | 2000 | 8600 |
| 2017-02-02 | ccc | 3200 | 9300 |
| 2016-04-04 | ccc | 3100 | 9300 |
| 2015-04-04 | ccc | 3000 | 9300 |
but my goal is to calculate the column col_sum_prev such that I get
| Date | id | col | col_sum | col_sum_prev |
| 2011-10-10 | aaa | 1100 | 2100 | 2100 |
| 2010-10-10 | aaa | 1000 | 1000 | 1000 |
| 2017-10-10 | bbb | 2300 | 8600 | 8600 |
| 2012-02-02 | bbb | 2200 | 8600 | 6300 |
| 2010-02-02 | bbb | 2100 | 8600 | 4100 |
| 2009-02-02 | bbb | 2000 | 8600 | 2000 |
| 2017-02-02 | ccc | 3200 | 9300 | 9300 |
| 2016-04-04 | ccc | 3100 | 9300 | 6100 |
| 2015-04-04 | ccc | 3000 | 9300 | 3000 |
So for every group this column should contain the sum of the previous values of col in this id-group, where previous means, descending by Date.
The code to get to the second output is
df = pd.DataFrame({'Date': [date(2011,10,10), date(2010,10,10), date(2017,10,10), date(2012,2,2), date(2010,2,2),
date(2009,2,2), date(2017,2,2), date(2016,4,4), date(2015,4,4)],
'id': ['aaa', 'aaa', 'bbb', 'bbb', 'bbb', 'bbb', 'ccc', 'ccc', 'ccc'],
'col': [1100, 1000, 2300, 2200, 2100, 2000, 3200, 3100, 3000]})
df['col_sum'] = df.groupby(['id'])['col'].transform('sum')
Any Ideas?
Use GroupBy.cumsum with swapped order of rows by DataFrame.iloc and [::-1]:
df['col_sum'] = df.groupby(['id'])['col'].transform('sum')
df['col_sum_pre'] = df.iloc[::-1].groupby(['id'])['col'].cumsum()
print (df)
Date id col col_sum col_sum_pre
0 2011-10-10 aaa 1100 2100 2100
1 2010-10-10 aaa 1000 2100 1000
2 2017-10-10 bbb 2300 8600 8600
3 2012-02-02 bbb 2200 8600 6300
4 2010-02-02 bbb 2100 8600 4100
5 2009-02-02 bbb 2000 8600 2000
6 2017-02-02 ccc 3200 9300 9300
7 2016-04-04 ccc 3100 9300 6100
8 2015-04-04 ccc 3000 9300 3000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With