how does the unicode thing works on python2? i just dont get it.
here i download data from a server and parse it for JSON.
Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.12-py2.6.egg/eventlet/hubs/poll.py", line 92, in wait readers.get(fileno, noop).cb(fileno) File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.12-py2.6.egg/eventlet/greenthread.py", line 202, in main result = function(*args, **kwargs) File "android_suggest.py", line 60, in fetch suggestions = suggest(chars) File "android_suggest.py", line 28, in suggest return [i['s'] for i in json.loads(opener.open('https://market.android.com/suggest/SuggRequest?json=1&query='+s+'&hl=de&gl=DE').read())] File "/usr/lib/python2.6/json/__init__.py", line 307, in loads return _default_decoder.decode(s) File "/usr/lib/python2.6/json/decoder.py", line 319, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.6/json/decoder.py", line 336, in raw_decode obj, end = self._scanner.iterscan(s, **kw).next() File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan rval, next_pos = action(m, context) File "/usr/lib/python2.6/json/decoder.py", line 217, in JSONArray value, end = iterscan(s, idx=end, context=context).next() File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan rval, next_pos = action(m, context) File "/usr/lib/python2.6/json/decoder.py", line 183, in JSONObject value, end = iterscan(s, idx=end, context=context).next() File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan rval, next_pos = action(m, context) File "/usr/lib/python2.6/json/decoder.py", line 155, in JSONString return scanstring(match.string, match.end(), encoding, strict) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data
thank you!!
EDIT: the following string causes the error: '[{"t":"q","s":"abh\xf6ren"}]'
. \xf6
should be decoded to ö
(abhören)
The Python "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte" occurs when we specify an incorrect encoding when decoding a bytes object. To solve the error, specify the correct encoding, e.g. utf-16 or open the file in binary mode ( rb or wb ).
To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.
Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.
The string you're trying to parse as a JSON is not encoded in UTF-8. Most likely it is encoded in ISO-8859-1. Try the following:
json.loads(unicode(opener.open(...), "ISO-8859-1"))
That will handle any umlauts that might get in the JSON message.
You should read Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). I hope that it will clarify some issues you're having around Unicode.
My solution is a bit funny.I never thought that would it be as easy as save as with UTF-8 codec.I'm using notepad++(v5.6.8).I didn't notice that I saved it with ANSI codec initially. I'm using separate file to place all localized dictionary. I found my solution under 'Encoding' tab from my Notepad++.I select 'Encoding in UTF-8 without BOM' and save it. It works brilliantly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With