Remove duplicates from python dataframe list

Question

I have a pandas df where each row is a list of words. The list has duplicate words. I want to remove duplicate words.

I tried using dict.fromkeys(listname) in a for loop to iterate over each row in the df. But this splits the words into alphabets

filepath = "C:/abc5/Python/Clustering/output2.csv"
df = pd.read_csv(filepath,encoding='windows-1252')

df["newlist"] = df["text_lemmatized"]
for i in range(0,len(df)):
    l = df["text_lemmatized"][i]
    df["newlist"][i] = list(dict.fromkeys(l))

print(df)

Expected result is ==>

['clear', 'pending', 'order', 'pending', 'order']   ['clear', 'pending', 'order']
 ['pending', 'activation', 'clear', 'pending']   ['pending', 'activation', 'clear']

Actual result is

['clear', 'pending', 'order', 'pending', 'order']  ...   [[, ', c, l, e, a, r, ,,  , p, n, d, i, g, o, ]]
['pending', 'activation', 'clear', 'pending', ...  ...  [[, ', p, e, n, d, i, g, ,,  , a, c, t, v, o, ...

Anthony Kong · Accepted Answer

Use set to remove duplicates.

Also you don't need the for loop

  df["newlist"] = list(set( df["text_lemmatized"] ))

Remove duplicates from python dataframe list

Tags:

python

pandas

dataframe

Anoop Mahajan

1 Answers

Anthony Kong

Recent Activity

Donate For Us

Remove duplicates from python dataframe list

Tags:

python

pandas

dataframe

Anoop Mahajan

1 Answers

Anthony Kong

Related questions

Recent Activity

Donate For Us