Is this a bug?
>>> import json
>>> import cPickle
>>> json.dumps(cPickle.dumps(u'å'))
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 361, in encode
return encode_basestring_ascii(o)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data
The json module is expecting strings to encode text. Pickled data isn't text, it's 8-bit binary.
One simple workaround, if you really need to send pickled data over JSON, is to use base64:
j = json.dumps(base64.b64encode(cPickle.dumps(u'å')))
cPickle.loads(base64.b64decode(json.loads(j)))
Note that this is very clearly a Python bug. Protocol version 0 is explicitly documented as ASCII, yet å is sent as the non-ASCII byte \xe5
instead of encoding it as "\u00E5"
. This bug was reported upstream--and the ticket was closed without the bug being fixed. http://bugs.python.org/issue2980
Could be a bug in pickle. My python documentation says (for used pickle format): Protocol version 0 is the original ASCII protocol and is backwards compatible with earlier versions of Python. [...] If a protocol is not specified, protocol 0 is used.
>>> cPickle.dumps(u'å').decode('ascii')
Traceback (most recent call last):
File "", line 1, in
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 1: ordinal not in range(128)
that aint no ASCII
and, don't know whether its relevant, or even a problem:
>>> cPickle.dumps(u'å') == pickle.dumps(u'å')
False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With