Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing special characters in a pandas dataframe

I have found information on how this could be done, but nothing has worked for me. I am trying to replace the special character 'ð'. I imported my data from a csv file and I used encoding='latin1' or else I kept getting errors. However, a simple DF['Column'].str.replace('ð', '') will not do the trick. I also tried decoding and using the hex value for that character which was recommended on another post, but that still won't work for me. Help is very much appreciated, and I am willing to post code if necessary.

like image 854
SKlein Avatar asked Dec 14 '22 21:12

SKlein


1 Answers

Call str.encode followed by str.decode:

df.YourCol.str.encode('utf-8').str.decode('ascii', 'ignore')

If you want to do this for multiple columns, you can slice and call df.applymap:

df[col_list].applymap(lambda x: x.encode('utf-8').decode('ascii', 'ignore'))

Remember that these operations are not in-place. So, you'll have to assign those columns back to their rightful place.

like image 171
cs95 Avatar answered Dec 16 '22 10:12

cs95