I'm using json.dump() and json.load() to save/read a dictionary of strings to/from disk. The issue is that I can't have any of the strings in unicode. They seem to be in unicode no matter how I set the parameters to dump/load (including ensure_ascii and encoding).
All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F). Any character may be escaped.
Since any JSON can represent unicode characters in escaped sequence \uXXXX , JSON can always be encoded in ASCII.
If you are just dealing with simple JSON objects, you can use the following:
def ascii_encode_dict(data):
ascii_encode = lambda x: x.encode('ascii')
return dict(map(ascii_encode, pair) for pair in data.items())
json.loads(json_data, object_hook=ascii_encode_dict)
Here is an example of how it works:
>>> json_data = '{"foo": "bar", "bar": "baz"}'
>>> json.loads(json_data) # old call gives unicode
{u'foo': u'bar', u'bar': u'baz'}
>>> json.loads(json_data, object_hook=ascii_encode_dict) # new call gives str
{'foo': 'bar', 'bar': 'baz'}
This answer works for a more complex JSON structure, and gives some nice explanation on the object_hook
parameter. There is also another answer there that recursively takes the result of a json.loads()
call and converts all of the Unicode strings to byte strings.
And if the json object is a mix of datatypes, not only unicode strings, you can use this expression:
def ascii_encode_dict(data):
ascii_encode = lambda x: x.encode('ascii') if isinstance(x, unicode) else x
return dict(map(ascii_encode, pair) for pair in data.items())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With