I'm trying to get the max count of consecutive 0 values from a given data frame with id,date,value columns from a data frame on pandas which look's like that:
id date value
354 2019-03-01 0
354 2019-03-02 0
354 2019-03-03 0
354 2019-03-04 5
354 2019-03-05 5
354 2019-03-09 7
354 2019-03-10 0
357 2019-03-01 5
357 2019-03-02 5
357 2019-03-03 8
357 2019-03-04 0
357 2019-03-05 0
357 2019-03-06 7
357 2019-03-07 7
540 2019-03-02 7
540 2019-03-03 8
540 2019-03-04 9
540 2019-03-05 8
540 2019-03-06 7
540 2019-03-07 5
540 2019-03-08 2
540 2019-03-09 3
540 2019-03-10 2
The desired result will be grouped by the Id and will look like this:
id max_consecutive_zeros
354 3
357 2
540 0
I've achieved what i want with a for but it gets really slow when you are working with huge pandas dataframes, i've found some similar solutions but it didn't work with my problem at all.
Create groupID m
for consecutive rows of same value. Next, groupby
on id
and m
and call value_counts
, and .loc
on multiindex to slice only 0
value of the right-most index level. Finally, filter out duplicates index by duplicated
in id
and reindex to create 0 value for id
having no 0
count
m = df.value.diff().ne(0).cumsum().rename('gid')
#Consecutive rows having the same value will be assigned same IDNumber by this command.
#It is the way to identify a group of consecutive rows having the same value, so I called it groupID.
df1 = df.groupby(['id', m]).value.value_counts().loc[:,:,0].droplevel(-1)
#this groupby groups consecutive rows of same value per ID into separate groups.
#within each group, count number of each value and `.loc` to pick specifically only `0` because we only concern on the count of value `0`.
df1[~df1.index.duplicated()].reindex(df.id.unique(), fill_value=0)
#There're several groups of value `0` per `id`. We want only group of highest count.
#`value_count` already sorted number of count descending, so we just need to pick
#the top one of duplicates by slicing on True/False mask of `duplicated`.
#finally, `reindex` adding any `id` doesn't have value 0 in original `df`.
#Note: `id` is the column `id` in `df`. It is different from groupID `m` we create to use with groupby
Out[315]:
id
354 3
357 2
540 0
Name: value, dtype: int64
Here is one way we need to create the additional key for groupby
then , just need groupby
this key and id
s=df.groupby('id').value.apply(lambda x : x.ne(0).cumsum())
df[df.value==0].groupby([df.id,s]).size().max(level=0).reindex(df.id.unique(),fill_value=0)
Out[267]:
id
354 3
357 2
540 0
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With