Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Cumulative sum of one column based on value of another

I am trying to calculate some statistics from a pandas dataframe. It looks something like this:

id     value     conditional
1      10        0
2      20        0
3      30        1
1      15        1
3      5         0
1      10        1

So, I need to calculate the cumulative sum of the column value for each id from top to botom, but only when conditional is 1.

So, this should give me something like:

id     value     conditional   cumulative sum
1      10        0             0
2      20        0             0
3      30        1             30
1      15        1             15
3      5         0             30
1      10        1             25

So, the sum of id=1 is taken only when conditional=1 in the 4th and 6th row and the 1st row value is not counted. How do I do this in pandas?

like image 650
sfactor Avatar asked Dec 25 '22 00:12

sfactor


1 Answers

You can create a Series that is the multiplication of value and conditional, and take the cumulative sum of it for each id group:

df['cumsum'] = (df['value']*df['conditional']).groupby(df['id']).cumsum()
df
Out: 
   id  value  conditional  cumsum
0   1     10            0       0
1   2     20            0       0
2   3     30            1      30
3   1     15            1      15
4   3      5            0      30
5   1     10            1      25
like image 195
ayhan Avatar answered Dec 26 '22 14:12

ayhan