Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

serializing to JSON that would retain hebrew charcters

I have the following use case:

from data I produce a json with data, part of it hebrew words. for example:

import json
j = {}
city =u'חיפה' #native unicode
j['results']= []
j['results'].append({'city':city}) #Also tried to city.encode('utf-8') and other encodings

In order to produce a json file that doubles as my app db (a micro geoapp) and as a file my users can edit and fix data directly I use the json lib and:

to_save = json.dumps(j)
with open('test.json','wb') as f: #also tried with w instead of wb flag.
   f.write(to_save)
   f.close()

The problem is I get a unicode decoded json with u'חיפה' represented for example as: u'\u05d7\u05d9\u05e4\u05d4'

most of the script and app don't have any problem reading the Unicodestring but my USERS have one!, and since contributing to the opensource project they need to edit the JSON directly, they can't figure out the Hebrew text.

so, THE QUESTION: how should I write the json while opening it in another editor would show Hebrew characters?

I'm not sure this is solvable because I suspect JSON is unicode all the way and I can't use asccii in it, but not sure about that.

Thanks for the help

like image 432
alonisser Avatar asked Aug 28 '13 06:08

alonisser


1 Answers

Use ensure_ascii=False argument.

>>> import json
>>> city = u'חיפה'
>>> print(json.dumps(city))
"\u05d7\u05d9\u05e4\u05d4"
>>> print(json.dumps(city, ensure_ascii=False))
"חיפה"

According to json.dump documentation:

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the result is a str instance consisting of ASCII characters only. If ensure_ascii is False, some chunks written to fp may be unicode instances. This usually happens because the input contains unicode strings or the encoding parameter is used. Unless fp.write() explicitly understands unicode (as in codecs.getwriter()) this is likely to cause an error.

Your code should read as follow:

import json
j = {'results': [u'חיפה']}
to_save = json.dumps(j, ensure_ascii=False)
with open('test.json', 'wb') as f:
    f.write(to_save.encode('utf-8'))

or

import codecs
import json
j = {'results': [u'חיפה']}
to_save = json.dumps(j, ensure_ascii=False)
with codecs.open('test.json', 'wb', encoding='utf-8') as f:
    f.write(to_save)

or

import codecs
import json
j = {'results': [u'חיפה']}
with codecs.open('test.json', 'wb', encoding='utf-8') as f:
    json.dump(j, f, ensure_ascii=False)
like image 145
falsetru Avatar answered Nov 15 '22 22:11

falsetru