Say that I have a df
like this:
Value
0 True
1 True
2 False
3 False
4 False
5 True
6 True
7 False
8 True
9 True
And say that I want to assign each group of True
values a label, such that consecutive True
values are assigned the same label because they constitute a cluster, whereas False
values get always 0
:
Value Label
0 True 1
1 True 1
2 False 0
3 False 0
4 False 0
5 True 2
6 True 2
7 False 0
8 True 3
9 True 3
How could I do this in pandas?
Try this:
>>> df['Label'] = df[df['Value']].index.to_series().diff().ne(1).cumsum()
>>> df
Value Label
0 True 1.0
1 True 1.0
2 False NaN
3 False NaN
4 False NaN
5 True 2.0
6 True 2.0
7 False NaN
8 True 3.0
9 True 3.0
>>>
Here is another approach that is fully independent of the index:
m = df['Value']
df['Label'] = m.ne(m.shift()).cumsum().where(m)//2+df['Value'].iloc[0]
Explanation: if successive values are different, start a new group, keep only the True groups, divide the group number by two to account for the alternating True/False and correct the initial group number depending on whether the first item is False or True.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With