Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative sum only applying on 1 column python

I would like to apply cumsum on 1 specific column only since I have got other values in different columns that must stay the same.

This is the script that I have so far

df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()

However this script results in that all of my columns in my pandas df will cumulate. The only column which must cumulate sum is data.

As requested, here is some sample data:

df = pd.DataFrame({'ID': ["880022443344556677787", "880022443344556677782", "880022443344556677787",
                          "880022443344556677782", "880022443344556677787", "880022443344556677782",
                          "880022443344556677781"],
                   'Month': ["201701", "201701", "201702", "201702", "201703", "201703", "201703"],
                   'Usage': [20, 40, 100, 50, 30, 30, 2000],
                   'Sec': [10, 15, 20, 1, 5, 6, 30]})

                      ID   Month  Sec  Usage
0  880022443344556677787  201701   10     20
1  880022443344556677782  201701   15     40
2  880022443344556677787  201702   20    100
3  880022443344556677782  201702    1     50
4  880022443344556677787  201703    5     30
5  880022443344556677782  201703    6     30
6  880022443344556677781  201703   30   2000

Desired output

                      ID   Month  Sec  Usage
0  880022443344556677787  201701   10     20
1  880022443344556677782  201701   15     40
2  880022443344556677787  201702   20    120
3  880022443344556677782  201702    1     90
4  880022443344556677787  201703    5    150
5  880022443344556677782  201703    6    120
6  880022443344556677781  201703   30   2000
like image 291
Joe_ft Avatar asked Feb 28 '26 11:02

Joe_ft


1 Answers

Consider the dataframe df

df = pd.DataFrame(dict(
        name=list('aaaaaaaabbbbbbbb'),
        day=np.tile(np.arange(2).repeat(4), 2),
        data=np.arange(16)
    ))

First, you perform your cumsum over a specific column by naming the column after the groupby statement.

Second, you can add this back to the dataframe df with join

d2 = df.groupby(['name', 'day']).data.sum().groupby(level=0).cumsum()

df.join(d2, on=['name', 'day'], rsuffix='_cum')

    data  day name  data_cum
0      0    0    a         6
1      1    0    a         6
2      2    0    a         6
3      3    0    a         6
4      4    1    a        28
5      5    1    a        28
6      6    1    a        28
7      7    1    a        28
8      8    0    b        38
9      9    0    b        38
10    10    0    b        38
11    11    0    b        38
12    12    1    b        92
13    13    1    b        92
14    14    1    b        92
15    15    1    b        92
like image 163
piRSquared Avatar answered Mar 02 '26 23:03

piRSquared