Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python encoding and json dumps

I apologize if this question has been asked earlier. I am still not clear about encoding in python3.2.

I am reading a csv(encoded in UTF-8 w/o BOM) and I have French accents in the csv.

Here is the code to opening and reading the csv file:

csvfile = open(in_file, 'r', encoding='utf-8')
fieldnames = ("id","locale","message")    
reader = csv.DictReader(csvfile,fieldnames,escapechar="\\") 
for row in reader:
        if row['id'] == id and row['locale'] == locale:
            out = row['message'];

I am returning the message(out) as Json

jsonout = json.dumps(out, ensure_ascii=True)    
return HttpResponse(jsonout,content_type="application/json; encoding=utf-8")

However when I preview the result I get the accent e(French) being replaced by \u00e9 .

Can you please advice on what I am doing wrong and what should I do so that the json output shows the proper e with accent.

Thanks

like image 933
tkansara Avatar asked Feb 23 '16 16:02

tkansara


2 Answers

You're doing nothing wrong (and neither is Python).

Python's json module simply takes the safe route and escapes non-ascii characters. This is a valid way of representing such characters in json, and any conforming parser will resurrect the proper Unicode characters when parsing the string:

>>> import json
>>> json.dumps({'Crêpes': 5})
'{"Cr\\u00eapes": 5}'
>>> json.loads('{"Cr\\u00eapes": 5}')
{'Crêpes': 5}

Don't forget that json is just a representation of your data, and both "ê" and "\\u00ea" are valid json representations of the string ê. Conforming json parsers should handle both correctly.

It is possible to disable this behaviour though, see the json.dump documentation:

>>> json.dumps({'Crêpes': 5}, ensure_ascii=False)
'{"Crêpes": 5}'
like image 59
marcelm Avatar answered Sep 18 '22 16:09

marcelm


In respect to this answer, setting ensure_ascii=False renders the special characters in your printouts. On the other hand, marcelm's answer is still correct, as no information is lost in those encodings.

like image 38
Dave J Avatar answered Sep 18 '22 16:09

Dave J