Replace unique values of multiple columns with a reference

Question

I am working with a dataframe that has multiple columns, and I wish to find the unique values of select columns and replaced them with another list values.

So for example, this is my dataframe:

import pandas as pd

data = {'col1': ["Bruce Wayne", "Clark Kent", "Peter Parker"], 
'col2': ["Alfred Pennyworth", "Bruce Wayne", "Clark Kent"]}
df = pd.DataFrame(data=data)

#           col1               col2
# 0   Bruce Wayne  Alfred Pennyworth
# 1    Clark Kent        Bruce Wayne
# 2  Peter Parker         Clark Kent

And I have the following list of values that I want to replace the unique values in my dataframe:

AlternativeNames = ["Batman", "Superman", "Spiderman", "Batman's butler"]

So the output will be:

        col1             col2
0     Batman  Batman's butler
1   Superman           Batman
2  Spiderman        Spiderman

You can assume the order does not matter. So if Clark Kent gets mapped to Batman, it is fine. However, the consistency of the mapping is important, so if Clark Kent gets mapped to Batman, it has to be applied everywhere.

I know how to get unique values of multiple columns, and I know about pd.factorize(); however, in this case I have a reference list, and I am not sure how to replace values according to the reference list.

Andreas · Accepted Answer

You can use the pandas Categorical data type:

df = df.stack().astype('category')
df.cat.categories = ["Batman", "Superman", "Spiderman", "Batman's butler"]
df = df.unstack()

              col1       col2
0         Superman     Batman
1        Spiderman   Superman
2  Batman's butler  Spiderman

Alternatively, shorter but harder to read:

alt = ["Batman", "Superman", "Spiderman", "Batman's butler"]
df.replace(dict(zip(df.stack().astype('category').cat.categories, alt)))

              col1       col2
0         Superman     Batman
1        Spiderman   Superman
2  Batman's butler  Spiderman

Replace unique values of multiple columns with a reference

Tags:

python

pandas

Josh

1 Answers

Andreas

Recent Activity

Donate For Us

Replace unique values of multiple columns with a reference

Tags:

python

pandas

Josh

1 Answers

Andreas

Related questions

Recent Activity

Donate For Us