How to count longest uninterrupted sequence in pandas

Question

Let's say I have pd.Series like below

s = pd.Series([False, True, False,True,True,True,False, False])    

0    False
1     True
2    False
3     True
4     True
5     True
6    False
7    False
dtype: bool

I want to know how long is the longest True sequence, in this example, it is 3.

I tried it in a stupid way.

s_list = s.tolist()
count = 0
max_count = 0
for item in s_list:
    if item:
        count +=1
    else:
        if count>max_count:
            max_count = count
        count = 0
print(max_count)

It will print 3, but in a Series of all True, it will print 0

piRSquared · Accepted Answer

Option 1
Use a the series itself to mask the cumulative sum of the negation. Then use value_counts

(~s).cumsum()[s].value_counts().max()

3

explanation

(~s).cumsum() is a pretty standard way to produce distinct True/False groups
```
0    1
1    1
2    2
3    2
4    2
5    2
6    3
7    4
dtype: int64
```
But you can see that the group we care about is represented by the 2s and there are four of them. That's because the group is initiated by the first False (which becomes True with (~s)). Therefore, we mask this cumulative sum with the boolean mask we started with.
```
(~s).cumsum()[s]

1    1
3    2
4    2
5    2
dtype: int64
```
Now we see the three 2s pop out and we just have to use a method to extract them. I used value_counts and max.

Option 2
Use factorize and bincount

a = s.values
b = pd.factorize((~a).cumsum())[0]
np.bincount(b[a]).max()

3

explanation
This is a similar explanation as for option 1. The main difference is in how I a found the max. I use pd.factorize to tokenize the values into integers ranging from 0 to the total number of unique values. Given the actual values we had in (~a).cumsum() we didn't strictly need this part. I used it because it's a general purpose tool that could be used on arbitrary group names.

After pd.factorize I use those integer values in np.bincount which accumulates the total number of times each integer is used. Then take the maximum.

Option 3
As stated in the explanation of option 2, this also works:

a = s.values
np.bincount((~a).cumsum()[a]).max()

3

How to count longest uninterrupted sequence in pandas

Tags:

python

pandas

Dawei

Video Answer

1 Answers

piRSquared

Recent Activity

Donate For Us

How to count longest uninterrupted sequence in pandas

Tags:

python

pandas

Dawei

Video Answer

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us