Assign Unique Numeric Group IDs to Groups in Pandas [duplicate]

Question

I've consistently run into this issue of having to assign a unique ID to each group in a data set. I've used this when zero padding for RNN's, generating graphs, and many other occasions.

This can usually be done by concatenating the values in each pd.groupby column. However, it is often the case the number of columns that define a group, their dtype, or the value sizes make concatenation an impractical solution that needlessly uses up memory.

I was wondering if there was an easy way to assign a unique numeric ID to groups in pandas.

BENY · Accepted Answer

You just need ngroup data from seeiespi (or pd.factorize)

df.groupby('C').ngroup()
Out[322]: 
0    0
1    0
2    2
3    1
4    1
5    1
6    1
7    2
8    2
dtype: int64

More Option

pd.factorize(df.C)[0]
Out[323]: array([0, 0, 1, 2, 2, 2, 2, 1, 1], dtype=int64)
df.C.astype('category').cat.codes
Out[324]: 
0    0
1    0
2    2
3    1
4    1
5    1
6    1
7    2
8    2
dtype: int8

seeiespi · Answer

I managed a simple solution that I constantly reference and wanted to share:

df = pd.DataFrame({'A':[1,2,3,4,6,3,7,3,2],'B':[4,3,8,2,6,3,9,1,0], 'C':['a','a','c','b','b','b','b','c','c']})

df = df.sort_values('C')

df['gid'] = (df.groupby(['C']).cumcount()==0).astype(int)

df['gid'] = df['gid'].cumsum()

In [17]: df
Out[17]:
   A  B  C  gid
0  1  4  a    1
1  2  3  a    1
2  3  8  b    2
3  4  2  b    2
4  6  6  b    2
5  3  3  b    2
6  7  9  c    3
7  3  1  c    3
8  2  0  c    3

Assign Unique Numeric Group IDs to Groups in Pandas [duplicate]

Tags:

python

pandas

pandas-groupby

seeiespi

2 Answers

BENY

seeiespi

Recent Activity

Donate For Us

Assign Unique Numeric Group IDs to Groups in Pandas [duplicate]

Tags:

python

pandas

pandas-groupby

seeiespi

2 Answers

BENY

seeiespi

Related questions

Recent Activity

Donate For Us