How can I generate a new column listing repeated values? For example, my dataframe is:
id color
123 white
123 white
123 white
345 blue
345 blue
678 red
This is the desired output:
# id color
1 123 white
1 123 white
1 123 white
2 345 blue
2 345 blue
3 678 red
Check withfactorize
df['#']=df.id.factorize()[0]+1
df
id color #
0 123 white 1
1 123 white 1
2 123 white 1
3 345 blue 2
4 345 blue 2
5 678 red 3
Another method
df.groupby('id').ngroup()+1
0 1
1 1
2 1
3 2
4 2
5 3
dtype: int64
To add it to the first positon:
df.insert(loc=0, column='#', value=df.id.factorize()[0]+1)
df
# id color
0 1 123 white
1 1 123 white
2 1 123 white
3 2 345 blue
4 2 345 blue
5 3 678 red
You can also use categorical codes:
df['id'].astype('category').cat.codes
Output:
0 0
1 0
2 0
3 1
4 1
5 2
dtype: int8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With