Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas cumsum with reset everytime there is a 0

I have a matrix with 0s and 1s, and want to do a cumsum on each column that resets to 0 whenever a zero is observed. For example, if we have the following:

df = pd.DataFrame([[0,1],[1,1],[0,1],[1,0],[1,1],[0,1]],columns = ['a','b'])
print(df)
   a  b
0  0  1
1  1  1
2  0  1
3  1  0
4  1  1
5  0  1

The result I desire is:

print(df)
   a  b
0  0  1
1  1  2
2  0  3
3  1  0
4  2  1
5  0  2

However, when I try df.cumsum() * df, I am able to correctly identify the 0 elements, but the counter does not reset:

print(df.cumsum() * df)
   a  b
0  0  1
1  1  2
2  0  3
3  2  0
4  3  4
5  0  5
like image 463
nanojohn Avatar asked Aug 30 '17 15:08

nanojohn


2 Answers

You can use:

a = df != 0
df1 = a.cumsum()-a.cumsum().where(~a).ffill().fillna(0).astype(int)
print (df1)
   a  b
0  0  1
1  1  2
2  0  3
3  1  0
4  2  1
5  0  2
like image 172
jezrael Avatar answered Sep 20 '22 10:09

jezrael


You may also try the following naive but reliable approach.

Per every column - create groups to count within. Group starts once sequential value difference by row appears and lasts while value is being constant: (x != x.shift()).cumsum().
Example:

    a   b
0   1   1
1   2   1
2   3   1
3   4   2
4   4   3
5   5   3

Calculate cummulative sums within groups per columns using pd.DataFrame's apply and groupby methods and you get cumsum with the zero reset in one line:

import pandas as pd
df = pd.DataFrame([[0,1],[1,1],[0,1],[1,0],[1,1],[0,1]],columns = ['a','b'])

cs = df.apply(lambda x: x.groupby((x != x.shift()).cumsum()).cumsum())
print(cs)

   a  b
0  0  1
1  1  2
2  0  3
3  1  0
4  2  1
5  0  2
like image 22
George Shimanovsky Avatar answered Sep 19 '22 10:09

George Shimanovsky