Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

find groups of neighboring True in pandas series

Tags:

python

pandas

I have a series with True and False and need to find all groups of True. This means that I need to find the start index and end index of neighboring Truevalues.

The following code gives the intended result but is very slow, inefficient and clumsy.

import pandas as pd

def groups(ser):
    g = []

    flag = False
    start = None
    for idx, s in ser.items():
        if flag and not s:
            g.append((start, idx-1))
            flag = False
        elif not flag and s:
            start = idx
            flag = True
    if flag:
        g.append((start, idx))
    return g

if __name__ == "__main__":
    ser = pd.Series([1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1], dtype=bool)
    print(ser)

    g = groups(ser)
    print("\ngroups of True:")
    for start, end in g:
        print("from {} until {}".format(start, end))
    pass

output is:

0      True
1      True
2     False
3     False
4      True
5     False
6     False
7      True
8      True
9      True
10     True
11    False
12     True
13    False
14     True

groups of True:
from 0 until 1
from 4 until 4
from 7 until 10
from 12 until 12
from 14 until 14

There are similar questions out there but non is looking to find the indices of the group starts/ends.

  • Label contiguous groups of True elements within a pandas Series
  • Streaks of True or False in pandas Series
like image 567
user7431005 Avatar asked Mar 04 '21 14:03

user7431005


People also ask

Can you use ILOC on a series?

iloc attribute enables purely integer-location based indexing for selection by position over the given Series object. Example #1: Use Series. iloc attribute to perform indexing over the given Series object.

How do I check if two rows have the same value in Pandas?

Pandas Series: equals() function The equals() function is used to test whether two Pandas objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do you split series in Pandas?

In the pandas series constructor, the div() or divide() method is used to perform floating division of two series objects or division of a series with a scalar value. And performs element-wise division operation. The method returns a series with the result of floating division values.

What is any () in Pandas?

Pandas DataFrame any() Method The any() method returns one value for each column, True if ANY value in that column is True, otherwise False. By specifying the column axis ( axis='columns' ), the all() method returns True if ANY value in that axis is True.


2 Answers

It's common to use cumsum on the negation to check for consecutive blocks. For example:

for _,x in s[s].groupby((1-s).cumsum()):
    print(f'from {x.index[0]} to {x.index[-1]}')

Output:

from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14
like image 58
Quang Hoang Avatar answered Oct 19 '22 20:10

Quang Hoang


You can use itertools:

In [478]: from operator import itemgetter
     ...: from itertools import groupby

In [489]: a = ser[ser].index.tolist() # Create a list of indexes having `True` in `ser` 

In [498]: for k, g in groupby(enumerate(a), lambda ix : ix[0] - ix[1]):
     ...:     l = list(map(itemgetter(1), g))
     ...:     print(f'from {l[0]} to {l[-1]}')
     ...: 
from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14
like image 41
Mayank Porwal Avatar answered Oct 19 '22 20:10

Mayank Porwal