UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data

Tags:

how does the unicode thing works on python2? i just dont get it.

here i download data from a server and parse it for JSON.

Traceback (most recent call last):   File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.12-py2.6.egg/eventlet/hubs/poll.py", line 92, in wait     readers.get(fileno, noop).cb(fileno)   File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.12-py2.6.egg/eventlet/greenthread.py", line 202, in main     result = function(*args, **kwargs)   File "android_suggest.py", line 60, in fetch     suggestions = suggest(chars)   File "android_suggest.py", line 28, in suggest     return [i['s'] for i in json.loads(opener.open('https://market.android.com/suggest/SuggRequest?json=1&query='+s+'&hl=de&gl=DE').read())]   File "/usr/lib/python2.6/json/__init__.py", line 307, in loads     return _default_decoder.decode(s)   File "/usr/lib/python2.6/json/decoder.py", line 319, in decode     obj, end = self.raw_decode(s, idx=_w(s, 0).end())   File "/usr/lib/python2.6/json/decoder.py", line 336, in raw_decode     obj, end = self._scanner.iterscan(s, **kw).next()   File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan     rval, next_pos = action(m, context)   File "/usr/lib/python2.6/json/decoder.py", line 217, in JSONArray     value, end = iterscan(s, idx=end, context=context).next()   File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan     rval, next_pos = action(m, context)   File "/usr/lib/python2.6/json/decoder.py", line 183, in JSONObject     value, end = iterscan(s, idx=end, context=context).next()   File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan     rval, next_pos = action(m, context)   File "/usr/lib/python2.6/json/decoder.py", line 155, in JSONString     return scanstring(match.string, match.end(), encoding, strict) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data

thank you!!

EDIT: the following string causes the error: '[{"t":"q","s":"abh\xf6ren"}]'. \xf6 should be decoded to ö (abhören)

285

asked May 30 '11 20:05

ihucos

2 Answers

The string you're trying to parse as a JSON is not encoded in UTF-8. Most likely it is encoded in ISO-8859-1. Try the following:

json.loads(unicode(opener.open(...), "ISO-8859-1"))

That will handle any umlauts that might get in the JSON message.

You should read Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). I hope that it will clarify some issues you're having around Unicode.

answered Sep 21 '22 10:09

Tadeusz A. Kadłubowski

My solution is a bit funny.I never thought that would it be as easy as save as with UTF-8 codec.I'm using notepad++(v5.6.8).I didn't notice that I saved it with ANSI codec initially. I'm using separate file to place all localized dictionary. I found my solution under 'Encoding' tab from my Notepad++.I select 'Encoding in UTF-8 without BOM' and save it. It works brilliantly.

answered Sep 23 '22 10:09

rolypoly

Related questions
                            
                                How do I bind the enter key to a function in tkinter?
                            
                                How to update a document using elasticsearch-py?
                            
                                list memory usage in ipython and jupyter
                            
                                Pandas DataFrames with NaNs equality comparison
                            
                                Matplotlib: How to plot images instead of points?
                            
                                Try-except clause with an empty except code [duplicate]
                            
                                Find matching rows in 2 dimensional numpy array
                            
                                Apply StringIndexer to several columns in a PySpark Dataframe
                            
                                Modify bound variables of a closure in Python
                            
                                Communicating with a running python daemon
                            
                                How to create a bytes or bytearray of given length filled with zeros in Python?
                            
                                Still can't install scipy due to missing fortran compiler after brew install gcc on Mac OS X
                            
                                In Python, how do I check the size of a StringIO object?
                            
                                creating multiple excel worksheets using data in a pandas dataframe
                            
                                How to get Python exception text
                            
                                __init__ as a constructor?
                            
                                How to right align level field in Python logging.Formatter
                            
                                Add a non-model field on a ModelSerializer in DRF 3
                            
                                Numpy remove a dimension from np array
                            
                                Encoding nested python object in JSON

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data

Tags:

python

unicode

python-2.x

ihucos

People also ask

2 Answers

Tadeusz A. Kadłubowski

rolypoly

Recent Activity

Donate For Us