Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Random ID Category

I would like to be able to assign a PRNG to a dataframe.

I can assign a unique ID using cat.codes or ngroup()

import pandas as pd
import random
import string

df1 = pd.DataFrame({'Name': ['John', 'Susie', 'Jack', 'Jill', 'John']})
df1['id'] = df1.groupby('Name').ngroup()
df1['idz'] = df1['Name'].astype('category').cat.codes

    Name    id  idz
0   John    2   2
1   Susie   3   3
2   Jack    0   0
3   Jill    1   1
4   John    2   2

and I've used a function from this post to create this unique ID row-by-row.

def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.SystemRandom().choice(chars) for _ in range(size))

df1['random id'] = df1['idz'].apply(lambda x : id_generator(3))

    Name    id  idz random id
0   John    2   2   118 #<--- Check Here
1   Susie   3   3   KGZ
2   Jack    0   0   KMQ
3   Jill    1   1   T2L
4   John    2   2   Q3F #<--- Check Here

But how do I combine the two together so that John in this small use-case would recieve the same ID? I'd like to avoid a long if ID not used, then ID, and if name has ID, use existing ID loop if possible due to size of data.

like image 811
MattR Avatar asked Oct 18 '25 00:10

MattR


1 Answers

gourpby + transform

df1['random id'] = df1.groupby('idz').idz.transform(lambda x : id_generator(3))
df1
Out[657]: 
    Name  id  idz random id
0   John   2    2       35P
1  Susie   3    3       6UU
2   Jack   0    0       XGF
3   Jill   1    1       5LC
4   John   2    2       35P
like image 101
BENY Avatar answered Oct 20 '25 14:10

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!