Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you replace duplicate values with multiple unique strings in Pandas?

Tags:

python

pandas

import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]} 
df = pd.DataFrame(data)

Lets say I have a dataframe that looks like this. I am trying to figure out how to check the Name column for the value 'Tom' and if I find it the first time I replace it with the value 'FirstTom' and the second time it appears I replace it with the value 'SecondTom'. How do you accomplish this? I've used the replace method before but only for replacing all Toms with a single value. I don't want to add a 1 on the end of the value, but completely change the string to something else.

Edit:

If the df looked more like this below, how would we check for Tom in the first column and the second column and then replace the first instance with FirstTom and the second instance with SecondTom

data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'], 'OtherName':[Tom, John, Bob,Steve]}

like image 899
Logan0015 Avatar asked Jan 28 '20 14:01

Logan0015


People also ask

How replace multiple strings in pandas?

Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column.

How do I get unique values in multiple columns in pandas?

Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.

How do you replace duplicates in a string in python?

When it is required to replicate the duplicate occurrence in a string, the keys, the 'index' method and list comprehension can be used. The list comprehension is a shorthand to iterate through the list and perform operations on it.


3 Answers

We can do cumcount

df.Name=df.Name+df.groupby('Name').cumcount().astype(str)
df
     Name  Age
0    Tom0   20
1    Tom1   21
2   Jack0   19
3  Terry0   18

Update

suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))
g=df.groupby('Name')


df.Name=df.Name.radd(g.cumcount().add(1).map(suf).mask(g.Name.transform('count')==1,''))
df
     Name  Age
0  1stTom   20
1  2ndTom   21
2    Jack   19
3   Terry   18

Update 2 for column

suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))

g=s.groupby([s.index.get_level_values(0),s])
s=s.radd(g.cumcount().add(1).map(suf).mask(g.transform('count')==1,''))
s=s.unstack()
     Name OtherName
0  1stTom    2ndTom
1   Jerry      John
2    Jack       Bob
3   Terry     Steve
like image 188
BENY Avatar answered Nov 15 '22 15:11

BENY


Just adding in to the existing solutions , you can use inflect to create dynamic dictionary

import inflect
p = inflect.engine()

df['Name'] += df.groupby('Name').cumcount().add(1).map(p.ordinal).radd('_')
print(df)

        Name  Age
0    Tom_1st   20
1    Tom_2nd   21
2   Jack_1st   19
3  Terry_1st   18
like image 24
anky Avatar answered Nov 15 '22 13:11

anky


transform

nth = ['First', 'Second', 'Third', 'Fourth']

def prefix(d):
    n = len(d)
    if n > 1:
        return d.radd([nth[i] for i in range(n)])
    else:
        return d

df.assign(Name=df.groupby('Name').Name.transform(prefix))

          Name  Age
0     FirstTom   20
1    SecondTom   21
2         Jack   19
3        Terry   18
4   FirstSteve   17
5  SecondSteve   16
6   ThirdSteve   15
​
like image 42
piRSquared Avatar answered Nov 15 '22 15:11

piRSquared