How can I count the number of consecutive TRUEs in a DataFrame?

Tags:

I have a dataset made of True and False.

Sample Table:
       A      B      C
0  False   True  False
1  False  False  False
2   True   True  False
3   True   True   True
4  False   True  False
5   True   True   True
6   True  False  False
7   True  False   True
8  False   True   True
9   True  False  False

I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.

For the table above, I would get:

length = [3, 4, 2]

I found similar threads but none resolved my problem.

Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.

And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:

index = [5, 2, 7]

766

asked Oct 09 '18 09:10

meliksahturker

1 Answers

Solution should be simplify, if always at least one True per column:

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

print (c)
   A  B  C
0  0  1  0
1  0  0  0
2  1  1  0
3  2  2  1
4  0  3  0
5  1  4  1
6  2  0  0
7  3  0  1
8  0  1  2
9  1  0  0

#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]

#get indexes by maximal value, subtract length and add 1 
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]

Detail:

print (pd.concat([b,
                  b.mask(df), 
                  b.mask(df).ffill(), 
                  b.mask(df).ffill().fillna(0),
                  b.sub(b.mask(df).ffill().fillna(0)).astype(int)
                  ], axis=1, 
                  keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

  cumsum       mask           ffill           fillna           sub      
       A  B  C    A    B    C     A    B    C      A    B    C   A  B  C
0      0  1  0  0.0  NaN  0.0   0.0  NaN  0.0    0.0  0.0  0.0   0  1  0
1      0  1  0  0.0  1.0  0.0   0.0  1.0  0.0    0.0  1.0  0.0   0  0  0
2      1  2  0  NaN  NaN  0.0   0.0  1.0  0.0    0.0  1.0  0.0   1  1  0
3      2  3  1  NaN  NaN  NaN   0.0  1.0  0.0    0.0  1.0  0.0   2  2  1
4      2  4  1  2.0  NaN  1.0   2.0  1.0  1.0    2.0  1.0  1.0   0  3  0
5      3  5  2  NaN  NaN  NaN   2.0  1.0  1.0    2.0  1.0  1.0   1  4  1
6      4  5  2  NaN  5.0  2.0   2.0  5.0  2.0    2.0  5.0  2.0   2  0  0
7      5  5  3  NaN  5.0  NaN   2.0  5.0  2.0    2.0  5.0  2.0   3  0  1
8      5  6  4  5.0  NaN  NaN   5.0  5.0  2.0    5.0  5.0  2.0   0  1  2
9      6  6  4  NaN  6.0  4.0   5.0  6.0  4.0    5.0  6.0  4.0   1  0  0

EDIT:

General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:

print (df)
       A      B      C
0  False   True  False
1  False  False  False
2   True   True  False
3   True   True  False
4  False   True  False
5   True   True  False
6   True  False  False
7   True  False  False
8  False   True  False
9   True  False  False

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]

index =  np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]

answered Oct 04 '22 23:10

jezrael

Related questions
                            
                                python map() on zipped object
                            
                                What is the difference between var, cvar and ivar in python's sphinx?
                            
                                python fuzzywuzzy's process.extract(): how does it work?
                            
                                Repeating letters like excel columns?
                            
                                Resample Daily Data to Monthly with Pandas (date formatting)
                            
                                IB API Python sample not using Ibpy
                            
                                Combining cv2.imshow() with matplotlib plt.show() in real time
                            
                                Numpy diff inverted operation?
                            
                                How to make numpy array column sum up to 1
                            
                                why UniqueConstraint doesn't work in flask_sqlalchemy
                            
                                Why "numpy.any" has no short-circuit mechanism?
                            
                                Can Pandas perform row-wise min() and max() functions?
                            
                                How to copy a file from host to container using docker-py (docker SDK)
                            
                                Django test Client submitting a form with a POST request
                            
                                How to remove case-insensitive duplicates from a list, while maintaining the original list order?
                            
                                Django No module named 'django.db.migrations.migration'
                            
                                Dynamic task definition in Airflow
                            
                                pipenv and pyinstaller not packaging dependencies
                            
                                How to implement deprecation in python with argument alias
                            
                                Pandas Concat increases number of rows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I count the number of consecutive TRUEs in a DataFrame?

Tags:

python

pandas

dataframe

count

numpy

meliksahturker

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us