Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a new column assigning same index to repeated values in Pandas DataFrame [closed]

How can I generate a new column listing repeated values? For example, my dataframe is:

id    color

123   white
123   white
123   white
345   blue
345   blue
678   red

This is the desired output:

#    id   color

1   123   white
1   123   white
1   123   white 
2   345   blue
2   345   blue
3   678   red
like image 608
jimmy Avatar asked Jun 21 '20 01:06

jimmy


2 Answers

Check withfactorize

df['#']=df.id.factorize()[0]+1
df
    id  color  #
0  123  white  1
1  123  white  1
2  123  white  1
3  345   blue  2
4  345   blue  2
5  678    red  3

Another method

df.groupby('id').ngroup()+1
0    1
1    1
2    1
3    2
4    2
5    3
dtype: int64

To add it to the first positon:

df.insert(loc=0, column='#', value=df.id.factorize()[0]+1)
df
   #   id  color 
0  1  123  white  
1  1  123  white  
2  1  123  white  
3  2  345   blue  
4  2  345   blue  
5  3  678    red  
like image 76
BENY Avatar answered Nov 14 '22 23:11

BENY


You can also use categorical codes:

df['id'].astype('category').cat.codes

Output:

0    0
1    0
2    0
3    1
4    1
5    2
dtype: int8
like image 26
Scott Boston Avatar answered Nov 14 '22 21:11

Scott Boston