Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to assign unique values to groups of rows in a pandas dataframe based on a condition?

My dataframe looks like this:

import pandas as pd
example = [{'A':3}, {'A':5}, {'A':0}, {'A':2}, {'A':6}, {'A':9}, {'A':0}, {'A':3}, {'A':4}]
df = pd.DataFrame(example)
print(df)

Output:

df
3
5
0
2
6
9
0
3
4

A new 'cluster' occurs after a 0 shows up in the df. I want to give each of these clusters an unique value, like this:

df
3    A
5    A
0    -
2    B
6    B
9    B
0    -
3    C
4    C

I have tried using enumerate and itertools but since I am new to Python I am struggling with the correct usage and syntax of these options.

like image 784
DnVS Avatar asked Jun 17 '19 14:06

DnVS


2 Answers

You can use cumsum and map to letters with chr:

m = df['A'].eq(0)
df['B'] = m.cumsum().add(65).map(chr).mask(m, '-')
df

   A  B
0  3  A
1  5  A
2  0  B
3  2  B
4  6  B
5  9  B
6  0  C
7  3  C
8  4  C

A NumPy solution can be written from this using views, and should be quite fast:

m = np.cumsum(df['A'].values == 0)
# thanks to @user3483203 for the neat trick! 
df['B'] = (m + 65).view('U2')
df

   A  B
0  3  A
1  5  A
2  0  B
3  2  B
4  6  B
5  9  B
6  0  C
7  3  C
8  4  C

From v0.22, you can also do this through pandas Series.view:

m = df['A'].eq(0)
df['B'] = (m.cumsum()+65).view('U2').mask(m, '-')
df

   A  B
0  3  A
1  5  A
2  0  -
3  2  B
4  6  B
5  9  B
6  0  -
7  3  C
8  4  C
like image 192
cs95 Avatar answered Oct 17 '22 22:10

cs95


Here's one way using np.where. I'm using numerical labeling here, which might be more appropiate in the case there are many groups:

import numpy as np

m = df.eq(0)
df['A'] = np.where(m, '-', m.cumsum())

   A
0  0
1  0
2  - 
3  1
4  1
5  1
6  - 
7  2
8  2
like image 31
yatu Avatar answered Oct 17 '22 23:10

yatu