I have a series with True
and False
and need to find all groups of True
.
This means that I need to find the start index and end index of neighboring True
values.
The following code gives the intended result but is very slow, inefficient and clumsy.
import pandas as pd
def groups(ser):
g = []
flag = False
start = None
for idx, s in ser.items():
if flag and not s:
g.append((start, idx-1))
flag = False
elif not flag and s:
start = idx
flag = True
if flag:
g.append((start, idx))
return g
if __name__ == "__main__":
ser = pd.Series([1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1], dtype=bool)
print(ser)
g = groups(ser)
print("\ngroups of True:")
for start, end in g:
print("from {} until {}".format(start, end))
pass
output is:
0 True
1 True
2 False
3 False
4 True
5 False
6 False
7 True
8 True
9 True
10 True
11 False
12 True
13 False
14 True
groups of True:
from 0 until 1
from 4 until 4
from 7 until 10
from 12 until 12
from 14 until 14
There are similar questions out there but non is looking to find the indices of the group starts/ends.
iloc attribute enables purely integer-location based indexing for selection by position over the given Series object. Example #1: Use Series. iloc attribute to perform indexing over the given Series object.
Pandas Series: equals() function The equals() function is used to test whether two Pandas objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
In the pandas series constructor, the div() or divide() method is used to perform floating division of two series objects or division of a series with a scalar value. And performs element-wise division operation. The method returns a series with the result of floating division values.
Pandas DataFrame any() Method The any() method returns one value for each column, True if ANY value in that column is True, otherwise False. By specifying the column axis ( axis='columns' ), the all() method returns True if ANY value in that axis is True.
It's common to use cumsum
on the negation to check for consecutive blocks. For example:
for _,x in s[s].groupby((1-s).cumsum()):
print(f'from {x.index[0]} to {x.index[-1]}')
Output:
from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14
You can use itertools
:
In [478]: from operator import itemgetter
...: from itertools import groupby
In [489]: a = ser[ser].index.tolist() # Create a list of indexes having `True` in `ser`
In [498]: for k, g in groupby(enumerate(a), lambda ix : ix[0] - ix[1]):
...: l = list(map(itemgetter(1), g))
...: print(f'from {l[0]} to {l[-1]}')
...:
from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With