I have been trying to work on this issue for a while.I am trying to remove non ASCII characters form DB_user column and trying to replace them with spaces. But I keep getting some errors. This is how my data frame looks:
+----------------------------------------------------------- | DB_user source count | +----------------------------------------------------------- | ???/"Ò|Z?)?]??C %??J A 10 | | ?D$ZGU ;@D??_???T(?) B 3 | | ?Q`H??M'?Y??KTK$?Ù‹???ЩJL4??*?_?? C 2 | +-----------------------------------------------------------
I was using this function, which I had come across while researching the problem on SO.
def filter_func(string): for i in range(0,len(string)): if (ord(string[i])< 32 or ord(string[i])>126 break return '' And then using the apply function: df['DB_user'] = df.apply(filter_func,axis=1)
I keep getting the error:
'ord() expected a character, but string of length 66 found', u'occurred at index 2'
However, I thought by using the loop in the filter_func function, I was dealing with this by inputing a char into 'ord'. Therefore the moment it hits a non-ASCII character, it should be replaced by a space.
Could somebody help me out?
Thanks!
By using encode and decode function we can easily remove non-ASCII characters from Pandas DataFrame. In Python, the encode() function is used to encode the string using a given encoding, and decoding means converting a string of bytes to a Unicode string.
Use . replace() method to replace the Non-ASCII characters with the empty string.
In python, to remove non-ASCII characters in python, we need to use string. encode() with encoding as ASCII and error as ignore, to returns a string without ASCII character use string. decode().
Add df = df. astype(float) after the replace and you've got it. I'd skip inplace and just do df = df. replace('\*', '', regex=True).
you may try this:
df.DB_user.replace({r'[^\x00-\x7F]+':''}, regex=True, inplace=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With