Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

json.dumps(pickle.dumps(u'å')) raises UnicodeDecodeError

Is this a bug?

>>> import json
>>> import cPickle
>>> json.dumps(cPickle.dumps(u'å'))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 361, in encode
    return encode_basestring_ascii(o)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data
like image 745
Michael Avatar asked Feb 26 '23 02:02

Michael


2 Answers

The json module is expecting strings to encode text. Pickled data isn't text, it's 8-bit binary.

One simple workaround, if you really need to send pickled data over JSON, is to use base64:

j = json.dumps(base64.b64encode(cPickle.dumps(u'å')))
cPickle.loads(base64.b64decode(json.loads(j)))

Note that this is very clearly a Python bug. Protocol version 0 is explicitly documented as ASCII, yet å is sent as the non-ASCII byte \xe5 instead of encoding it as "\u00E5". This bug was reported upstream--and the ticket was closed without the bug being fixed. http://bugs.python.org/issue2980

like image 151
Glenn Maynard Avatar answered Mar 07 '23 21:03

Glenn Maynard


Could be a bug in pickle. My python documentation says (for used pickle format): Protocol version 0 is the original ASCII protocol and is backwards compatible with earlier versions of Python. [...] If a protocol is not specified, protocol 0 is used.


>>> cPickle.dumps(u'å').decode('ascii')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 1: ordinal not in range(128)

that aint no ASCII

and, don't know whether its relevant, or even a problem:

 
>>> cPickle.dumps(u'å') == pickle.dumps(u'å')
False
like image 27
knitti Avatar answered Mar 07 '23 23:03

knitti