My input looks like the below df.
I need to group by column (A, B) and count the number of consecutive zeros/ count the length of the consecutive zeros in each of the groups and write to a new column "Zero_count"
Input:
A B DATE hour measure
A10 1 1/1/2014 0 0
A10 1 1/1/2014 1 0
A10 1 1/1/2014 2 0
A10 1 1/1/2014 3 0
A10 2 1/1/2014 4 0
A10 2 1/1/2014 5 1
A10 2 1/1/2014 6 2
A10 3 1/1/2014 7 0
A11 1 1/1/2014 8 0
A11 1 1/1/2014 9 0
A11 1 1/1/2014 10 2
A11 1 1/1/2014 11 0
A11 1 1/1/2014 12 0
A12 2 1/1/2014 13 1
A12 2 1/1/2014 14 3
A12 2 1/1/2014 15 0
A12 4 1/1/2014 16 5
A12 4 1/1/2014 17 0
A12 6 1/1/2014 18 0
I tried using "groupby" technique to get the groups, but consecutive zero counting within the group is something that I am looking for. I have tried to use lambda function but that counts the total number of zeros, while I am interested in repeating consecutive zeros. I want my output to look like this:
Output
A B DATE hour measure Consec_zero_count
A10 1 1/1/2014 0 0 4
A10 1 1/1/2014 1 0 4
A10 1 1/1/2014 2 0 4
A10 1 1/1/2014 3 0 4
A10 2 1/1/2014 4 0 1
A10 2 1/1/2014 5 1 0
A10 2 1/1/2014 6 2 0
A10 3 1/1/2014 7 0 1
A11 1 1/1/2014 8 0 2
A11 1 1/1/2014 9 0 2
A11 1 1/1/2014 10 2 0
A11 1 1/1/2014 11 0 2
A11 1 1/1/2014 12 0 2
A12 2 1/1/2014 13 1 0
A12 2 1/1/2014 14 3 0
A12 2 1/1/2014 15 0 1
A12 4 1/1/2014 16 5 0
A12 4 1/1/2014 17 0 1
A12 6 1/1/2014 18 0 1
Any leads would be appreciated. Thanks in advance!
Create helper Series
for unique groups of consecutive values by compare by ne
(!=
) of shift
ed values with cumsum
. Then groupby
with transform
and size
. Last fiter values only for 0
with numpy.where
:
g = df['measure'].ne(df['measure'].shift()).cumsum()
counts = df.groupby(['A','B', g])['measure'].transform('size')
df['Consec_zero_count'] = np.where(df['measure'].eq(0), counts, 0)
print (df)
A B DATE hour measure Consec_zero_count
0 A10 1 1/1/2014 0 0 4
1 A10 1 1/1/2014 1 0 4
2 A10 1 1/1/2014 2 0 4
3 A10 1 1/1/2014 3 0 4
4 A10 2 1/1/2014 4 0 1
5 A10 2 1/1/2014 5 1 0
6 A10 2 1/1/2014 6 2 0
7 A10 3 1/1/2014 7 0 1
8 A11 1 1/1/2014 8 0 2
9 A11 1 1/1/2014 9 0 2
10 A11 1 1/1/2014 10 2 0
11 A11 1 1/1/2014 11 0 2
12 A11 1 1/1/2014 12 0 2
13 A12 2 1/1/2014 13 1 0
14 A12 2 1/1/2014 14 3 0
15 A12 2 1/1/2014 15 0 1
16 A12 4 1/1/2014 16 5 0
17 A12 4 1/1/2014 17 0 1
18 A12 6 1/1/2014 18 0 1
Similar to @jezrael's answer, but slightly different logic:
df.loc[df.measure.eq(0), 'Consec_zero_count'] = (df.groupby(['A','B', df.measure.ne(0).cumsum()])
.measure.transform(lambda x: x[x.eq(0)].size))
df['Consec_zero_count'] = df['Consec_zero_count'].fillna(0).astype(int)
>>> df
A B DATE hour measure Consec_zero_count
0 A10 1 1/1/2014 0 0 4
1 A10 1 1/1/2014 1 0 4
2 A10 1 1/1/2014 2 0 4
3 A10 1 1/1/2014 3 0 4
4 A10 2 1/1/2014 4 0 1
5 A10 2 1/1/2014 5 1 0
6 A10 2 1/1/2014 6 2 0
7 A10 3 1/1/2014 7 0 1
8 A11 1 1/1/2014 8 0 2
9 A11 1 1/1/2014 9 0 2
10 A11 1 1/1/2014 10 2 0
11 A11 1 1/1/2014 11 0 2
12 A11 1 1/1/2014 12 0 2
13 A12 2 1/1/2014 13 1 0
14 A12 2 1/1/2014 14 3 0
15 A12 2 1/1/2014 15 0 1
16 A12 4 1/1/2014 16 5 0
17 A12 4 1/1/2014 17 0 1
18 A12 6 1/1/2014 18 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With