I am trying to create a column (“consec”) which will keep a running count of consecutive values in another (“binary”) without using loop. This is what the desired outcome would look like:
. binary consec
1 0 0
2 1 1
3 1 2
4 1 3
5 1 4
5 0 0
6 1 1
7 1 2
8 0 0
However, this...
df['consec'][df['binary']==1] = df['consec'].shift(1) + df['binary']
results in this...
. binary consec
0 1 NaN
1 1 1
2 1 1
3 0 0
4 1 1
5 0 0
6 1 1
7 1 1
8 1 1
9 0 0
I see other posts which use grouping or sorting, but unfortunately, I don't see how that could work for me. Thanks in advance for your help.
You can use the compare-cumsum-groupby pattern (which I really need to getting around to writing up for the documentation), with a final cumcount
:
>>> df = pd.DataFrame({"binary": [0,1,1,1,0,0,1,1,0]})
>>> df["consec"] = df["binary"].groupby((df["binary"] == 0).cumsum()).cumcount()
>>> df
binary consec
0 0 0
1 1 1
2 1 2
3 1 3
4 0 0
5 0 0
6 1 1
7 1 2
8 0 0
This works because first we get the positions where we want to reset the counter:
>>> (df["binary"] == 0)
0 True
1 False
2 False
3 False
4 True
5 True
6 False
7 False
8 True
Name: binary, dtype: bool
The cumulative sum of these gives us a different id for each group:
>>> (df["binary"] == 0).cumsum()
0 1
1 1
2 1
3 1
4 2
5 3
6 3
7 3
8 4
Name: binary, dtype: int64
And then we can pass this to groupby
and use cumcount
to get an increasing index in each group.
For those who ended up here looking for an answer to the "misunderstood" version: To reset count for each change in the binary column, so that consec does "keep a running count of consecutive values", the following seems to work:
df["consec2"] = df["binary"].groupby((df["binary"] <> df["binary"].shift()).cumsum()).cumcount()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With