Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add numbers with duplicate values for columns in pandas

I have a data frame like this:

df:
col1     col2
 1        pqr
 3        abc
 2        pqr
 4        xyz
 1        pqr

I found that there is duplicate value and its pqr. I want to add 1,2,3 where pqr occurs. The final data frame I want to achieve is:

df1
col1      col2
 1        pqr1
 3        abc
 2        pqr2
 4        xyz
 1        pqr3

How to do it in efficient way

like image 517
Kallol Avatar asked Jan 09 '19 07:01

Kallol


1 Answers

Use duplicated with keep=False for all dupe rows and add counter created by cumcount:

mask = df['col2'].duplicated(keep=False)
df.loc[mask, 'col2'] += df.groupby('col2').cumcount().add(1).astype(str)

Or:

df['col2'] = np.where(df['col2'].duplicated(keep=False), 
                      df['col2'] + df.groupby('col2').cumcount().add(1).astype(str),
                      df['col2'])
print (df)
   col1  col2
0     1  pqr1
1     3   abc
2     2  pqr2
3     4   xyz
4     1  pqr3

If need same only for pqr values:

mask = df['col2'] == 'pqr'
df.loc[mask, 'col2'] += pd.Series(np.arange(1, mask.sum() + 1),
                                  index=df.index[mask]).astype(str)
print (df)
   col1  col2
0     1  pqr1
1     3   abc
2     2  pqr2
3     4   xyz
4     1  pqr3
like image 165
jezrael Avatar answered Oct 10 '22 01:10

jezrael