I have a pandas df where each row is a list of words. The list has duplicate words. I want to remove duplicate words.
I tried using dict.fromkeys(listname) in a for loop to iterate over each row in the df. But this splits the words into alphabets
filepath = "C:/abc5/Python/Clustering/output2.csv"
df = pd.read_csv(filepath,encoding='windows-1252')
df["newlist"] = df["text_lemmatized"]
for i in range(0,len(df)):
l = df["text_lemmatized"][i]
df["newlist"][i] = list(dict.fromkeys(l))
print(df)
Expected result is ==>
['clear', 'pending', 'order', 'pending', 'order'] ['clear', 'pending', 'order']
['pending', 'activation', 'clear', 'pending'] ['pending', 'activation', 'clear']
Actual result is
['clear', 'pending', 'order', 'pending', 'order'] ... [[, ', c, l, e, a, r, ,, , p, n, d, i, g, o, ]]
['pending', 'activation', 'clear', 'pending', ... ... [[, ', p, e, n, d, i, g, ,, , a, c, t, v, o, ...
Use set to remove duplicates.
Also you don't need the for loop
df["newlist"] = list(set( df["text_lemmatized"] ))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With