My dataframe looks like this:
import pandas as pd
example = [{'A':3}, {'A':5}, {'A':0}, {'A':2}, {'A':6}, {'A':9}, {'A':0}, {'A':3}, {'A':4}]
df = pd.DataFrame(example)
print(df)
Output:
df
3
5
0
2
6
9
0
3
4
A new 'cluster' occurs after a 0 shows up in the df. I want to give each of these clusters an unique value, like this:
df
3 A
5 A
0 -
2 B
6 B
9 B
0 -
3 C
4 C
I have tried using enumerate and itertools but since I am new to Python I am struggling with the correct usage and syntax of these options.
You can use cumsum
and map to letters with chr
:
m = df['A'].eq(0)
df['B'] = m.cumsum().add(65).map(chr).mask(m, '-')
df
A B
0 3 A
1 5 A
2 0 B
3 2 B
4 6 B
5 9 B
6 0 C
7 3 C
8 4 C
A NumPy solution can be written from this using views
, and should be quite fast:
m = np.cumsum(df['A'].values == 0)
# thanks to @user3483203 for the neat trick!
df['B'] = (m + 65).view('U2')
df
A B
0 3 A
1 5 A
2 0 B
3 2 B
4 6 B
5 9 B
6 0 C
7 3 C
8 4 C
From v0.22, you can also do this through pandas Series.view
:
m = df['A'].eq(0)
df['B'] = (m.cumsum()+65).view('U2').mask(m, '-')
df
A B
0 3 A
1 5 A
2 0 -
3 2 B
4 6 B
5 9 B
6 0 -
7 3 C
8 4 C
Here's one way using np.where
. I'm using numerical labeling here, which might be more appropiate in the case there are many groups:
import numpy as np
m = df.eq(0)
df['A'] = np.where(m, '-', m.cumsum())
A
0 0
1 0
2 -
3 1
4 1
5 1
6 -
7 2
8 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With