Remove duplicates within a string, but for the entire dataframe

Question

I want to achieve something like in this post: Python Dataframe: Remove duplicate words in the same cell within a column in Python, but for the entire dataframe in a efficient way.

My data looks something like this: It is a pandas data frame with a lot of columns. It has comma separated strings where there are a lot of duplicates - and I wish to remove all duplicates within those individual strings.

+--------------------+---------+---------------------+
|        Col1        |  Col2   |        Col3         |
+--------------------+---------+---------------------+
| Dog, Dog, Dog      | India   | Facebook, Instagram |
| Dog, Squirrel, Cat | Norway  | Facebook, Facebook  |
| Cat, Cat, Cat      | Germany | Twitter             |
+--------------------+---------+---------------------+

Reproducable example:

df = pd.DataFrame({"col1": ["Dog, Dog, Dog", "Dog, Squirrel, Cat", "Cat, Cat, Cat"],
                     "col2": ["India", "Norway", "Germany"],
                     "col3": ["Facebook, Instagram", "Facebook, Facebook", "Twitter"]})

I would like it to transform it to this:

+--------------------+---------+---------------------+
|        Col1        |  Col2   |        Col3         |
+--------------------+---------+---------------------+
| Dog                | India   | Facebook, Instagram |
| Dog, Squirrel, Cat | Norway  | Facebook            |
| Cat                | Germany | Twitter             |
+--------------------+---------+---------------------+

Grzegorz Skibinski · Accepted Answer

Try:

for col in ["col1", "col2", "col3"]:
    df[col]=df[col].str.split(", ").map(set).str.join(", ")

Outputs:

>>> df

                 col1     col2                 col3
0                 Dog    India  Facebook, Instagram
1  Dog, Cat, Squirrel   Norway             Facebook
2                 Cat  Germany              Twitter

Remove duplicates within a string, but for the entire dataframe

Tags:

python

pandas

transform

torkestativ

1 Answers

Grzegorz Skibinski

Recent Activity

Donate For Us

Remove duplicates within a string, but for the entire dataframe

Tags:

python

pandas

transform

torkestativ

1 Answers

Grzegorz Skibinski

Related questions

Recent Activity

Donate For Us