Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data

how does the unicode thing works on python2? i just dont get it.

here i download data from a server and parse it for JSON.

Traceback (most recent call last):   File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.12-py2.6.egg/eventlet/hubs/poll.py", line 92, in wait     readers.get(fileno, noop).cb(fileno)   File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.12-py2.6.egg/eventlet/greenthread.py", line 202, in main     result = function(*args, **kwargs)   File "android_suggest.py", line 60, in fetch     suggestions = suggest(chars)   File "android_suggest.py", line 28, in suggest     return [i['s'] for i in json.loads(opener.open('https://market.android.com/suggest/SuggRequest?json=1&query='+s+'&hl=de&gl=DE').read())]   File "/usr/lib/python2.6/json/__init__.py", line 307, in loads     return _default_decoder.decode(s)   File "/usr/lib/python2.6/json/decoder.py", line 319, in decode     obj, end = self.raw_decode(s, idx=_w(s, 0).end())   File "/usr/lib/python2.6/json/decoder.py", line 336, in raw_decode     obj, end = self._scanner.iterscan(s, **kw).next()   File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan     rval, next_pos = action(m, context)   File "/usr/lib/python2.6/json/decoder.py", line 217, in JSONArray     value, end = iterscan(s, idx=end, context=context).next()   File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan     rval, next_pos = action(m, context)   File "/usr/lib/python2.6/json/decoder.py", line 183, in JSONObject     value, end = iterscan(s, idx=end, context=context).next()   File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan     rval, next_pos = action(m, context)   File "/usr/lib/python2.6/json/decoder.py", line 155, in JSONString     return scanstring(match.string, match.end(), encoding, strict) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data 

thank you!!

EDIT: the following string causes the error: '[{"t":"q","s":"abh\xf6ren"}]'. \xf6 should be decoded to ö (abhören)

like image 285
ihucos Avatar asked May 30 '11 20:05

ihucos


People also ask

What is UTF-8 codec can't decode byte?

The Python "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte" occurs when we specify an incorrect encoding when decoding a bytes object. To solve the error, specify the correct encoding, e.g. utf-16 or open the file in binary mode ( rb or wb ).

How do I decode a UTF-8 string in Python?

To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.

How do you decode bytes in Python?

Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.


2 Answers

The string you're trying to parse as a JSON is not encoded in UTF-8. Most likely it is encoded in ISO-8859-1. Try the following:

json.loads(unicode(opener.open(...), "ISO-8859-1")) 

That will handle any umlauts that might get in the JSON message.

You should read Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). I hope that it will clarify some issues you're having around Unicode.

like image 67
Tadeusz A. Kadłubowski Avatar answered Sep 21 '22 10:09

Tadeusz A. Kadłubowski


My solution is a bit funny.I never thought that would it be as easy as save as with UTF-8 codec.I'm using notepad++(v5.6.8).I didn't notice that I saved it with ANSI codec initially. I'm using separate file to place all localized dictionary. I found my solution under 'Encoding' tab from my Notepad++.I select 'Encoding in UTF-8 without BOM' and save it. It works brilliantly.

like image 45
rolypoly Avatar answered Sep 23 '22 10:09

rolypoly