How do you replace duplicate values with multiple unique strings in Pandas?

Tags:

pandas

import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]} 
df = pd.DataFrame(data)

Lets say I have a dataframe that looks like this. I am trying to figure out how to check the Name column for the value 'Tom' and if I find it the first time I replace it with the value 'FirstTom' and the second time it appears I replace it with the value 'SecondTom'. How do you accomplish this? I've used the replace method before but only for replacing all Toms with a single value. I don't want to add a 1 on the end of the value, but completely change the string to something else.

Edit:

If the df looked more like this below, how would we check for Tom in the first column and the second column and then replace the first instance with FirstTom and the second instance with SecondTom

data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'], 'OtherName':[Tom, John, Bob,Steve]}

899

asked Jan 28 '20 14:01

Logan0015

3 Answers

We can do cumcount

df.Name=df.Name+df.groupby('Name').cumcount().astype(str)
df
     Name  Age
0    Tom0   20
1    Tom1   21
2   Jack0   19
3  Terry0   18

Update

suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))
g=df.groupby('Name')


df.Name=df.Name.radd(g.cumcount().add(1).map(suf).mask(g.Name.transform('count')==1,''))
df
     Name  Age
0  1stTom   20
1  2ndTom   21
2    Jack   19
3   Terry   18

Update 2 for column

suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))

g=s.groupby([s.index.get_level_values(0),s])
s=s.radd(g.cumcount().add(1).map(suf).mask(g.transform('count')==1,''))
s=s.unstack()
     Name OtherName
0  1stTom    2ndTom
1   Jerry      John
2    Jack       Bob
3   Terry     Steve

188

answered Nov 15 '22 15:11

BENY

Just adding in to the existing solutions , you can use inflect to create dynamic dictionary

import inflect
p = inflect.engine()

df['Name'] += df.groupby('Name').cumcount().add(1).map(p.ordinal).radd('_')
print(df)

        Name  Age
0    Tom_1st   20
1    Tom_2nd   21
2   Jack_1st   19
3  Terry_1st   18

answered Nov 15 '22 13:11

anky

`transform`

nth = ['First', 'Second', 'Third', 'Fourth']

def prefix(d):
    n = len(d)
    if n > 1:
        return d.radd([nth[i] for i in range(n)])
    else:
        return d

df.assign(Name=df.groupby('Name').Name.transform(prefix))

          Name  Age
0     FirstTom   20
1    SecondTom   21
2         Jack   19
3        Terry   18
4   FirstSteve   17
5  SecondSteve   16
6   ThirdSteve   15

answered Nov 15 '22 15:11

piRSquared

Related questions
                            
                                Passing a variable in url?
                            
                                finding the derivative of a polynomial
                            
                                Python negative zero slicing
                            
                                Comparing two lists in Python
                            
                                Python in raw mode stdin print adds spaces
                            
                                Cron parser and validation in python
                            
                                Python - Unsupported type(s) : range and range
                            
                                How do I convert LF to CRLF?
                            
                                How can I insert a string in a Entry widget that is in the "readonly" state?
                            
                                What does [^.]* mean in regular expression?
                            
                                Can I save a text file in python without closing it?
                            
                                pymc3 : Multiple observed values
                            
                                Flask-SQLAlchemy Abstract Base Model
                            
                                Read Excel XML .xls file with pandas
                            
                                how to install numpy on mac [duplicate]
                            
                                Sklearn changing string class label to int
                            
                                Reading a string line by line in python [closed]
                            
                                How to choose LSTM Keras parameters?
                            
                                How to do fully connected batch norm in PyTorch?
                            
                                How do I solve the future warning -> % (min_groups, self.n_splits)), Warning) in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With