I have the following use case:
from data I produce a json with data, part of it hebrew words. for example:
import json
j = {}
city =u'חיפה' #native unicode
j['results']= []
j['results'].append({'city':city}) #Also tried to city.encode('utf-8') and other encodings
In order to produce a json file that doubles as my app db (a micro geoapp) and as a file my users can edit and fix data directly I use the json lib and:
to_save = json.dumps(j)
with open('test.json','wb') as f: #also tried with w instead of wb flag.
f.write(to_save)
f.close()
The problem is I get a unicode decoded json with u'חיפה' represented for example as: u'\u05d7\u05d9\u05e4\u05d4'
most of the script and app don't have any problem reading the Unicodestring but my USERS have one!, and since contributing to the opensource project they need to edit the JSON directly, they can't figure out the Hebrew text.
so, THE QUESTION: how should I write the json while opening it in another editor would show Hebrew characters?
I'm not sure this is solvable because I suspect JSON is unicode all the way and I can't use asccii in it, but not sure about that.
Thanks for the help
Use ensure_ascii=False
argument.
>>> import json
>>> city = u'חיפה'
>>> print(json.dumps(city))
"\u05d7\u05d9\u05e4\u05d4"
>>> print(json.dumps(city, ensure_ascii=False))
"חיפה"
According to json.dump
documentation:
If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the result is a str instance consisting of ASCII characters only. If ensure_ascii is False, some chunks written to fp may be unicode instances. This usually happens because the input contains unicode strings or the encoding parameter is used. Unless fp.write() explicitly understands unicode (as in codecs.getwriter()) this is likely to cause an error.
Your code should read as follow:
import json
j = {'results': [u'חיפה']}
to_save = json.dumps(j, ensure_ascii=False)
with open('test.json', 'wb') as f:
f.write(to_save.encode('utf-8'))
or
import codecs
import json
j = {'results': [u'חיפה']}
to_save = json.dumps(j, ensure_ascii=False)
with codecs.open('test.json', 'wb', encoding='utf-8') as f:
f.write(to_save)
or
import codecs
import json
j = {'results': [u'חיפה']}
with codecs.open('test.json', 'wb', encoding='utf-8') as f:
json.dump(j, f, ensure_ascii=False)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With