Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python save serialization that correctly handles str/unicode?

Apart from PyYAML, are there any safe Python data serialization libraries which correctly handle unicode/str?

For example:

>>> json.loads(json.dumps([u"x", "x"]))
[u'x', u'x'] # Both unicode
>>> msgpack.loads(msgpack.dumps([u"x", "x"]))
['x', 'x'] # Neither are unicode
>>> bson.loads(bson.dumps({"x": [u"x", "x"]}))
{u'x': [u'x', 'x']} # Dict keys become unicode
>>> pyamf.decode(pyamf.encode([u"x", "x"])).next()
[u'x', u'x'] # Both are unicode

Note that I want the serializers to be safe (so pickle and marshel are out), and PyYAML is an option, but I dislike the complexity of YAML, so I'd like to know if there are other options.

Edit: it appears that there is some confusion about the nature of my data. Some of them are Unicode (ex, names) and some of them are binary (ex, images)… So a serialization library which confuses unicode and str is just as useless to me as a library which confuses "42" and 42.

like image 335
David Wolever Avatar asked Aug 09 '11 05:08

David Wolever


1 Answers

Maybe just use Python's repr to store the value and deserialize it using ast.literal_eval method:

In [7]: ast.literal_eval (repr({"d": ["x", u"x"]}))
Out[7]: {'d': ['x', u'x']}
like image 163
Michał Bentkowski Avatar answered Sep 27 '22 16:09

Michał Bentkowski