I have a feature of my program where the user can upload a csv file, which my program goes through and uses as input. I have one user complaining about a problem where his input is throwing up an error. The error is caused by there being an illegal character that is encoded wrong. The characters is below:
�
Sometimes it appears as a diamond with a "?" in the middle, sometimes it appears as a double diamond with "?" in the middle, sometimes it appears as "\xa0", and sometimes it appears as "\xa0\xa0".
In my program if I do:
print str_with_weird_char
The string will show up in my terminal with the diamond "?" in place of the weird character. If I copy+paste that string into ipython, it will exit with this message:
In [1]: g="blah��blah"
WARNING:
********
You or a %run:ed script called sys.stdin.close() or sys.stdout.close()!
Exiting IPython!
notice how the diamond "?" is double now. For some reason copy+paste makes it double...
In the django traceback page, it looks like this:
UnicodeDecodeError at /chris/import.html
('ascii', 'blah \xa0 BLAH', 14, 15, 'ordinal not in range(128)')
The thing that messes me up is that I can't do anything with this string without it throwing an exception. I tried unicode(), I tried str(), I tried .encode(), I tried .encode("utf-8"), no matter what it throws up an error.
What can I do it get this thing to be a working string?
You can pass, "ignore" to skip invalid characters in .encode/.decode
like "ILLEGAL".decode("utf8","ignore")
>>> "ILLEGA\xa0L".decode("utf8")
...
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 6: unexpected code byte
>>> "ILLEGA\xa0L".decode("utf8","ignore")
u'ILLEGAL'
>>>
Declare the coding on the second line of your script. It really has to be second. Like
#!/usr/bin/python
# coding=utf-8
This might be enough to solve your problem all by itself. If not, see str.encode('utf-8') and str.decode('utf-8').
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With