Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: json.dumps can't handle utf-8?

Tags:

Below is the test program, including a Chinese character:

# -*- coding: utf-8 -*- import json  j = {"d":"中", "e":"a"} json = json.dumps(j, encoding="utf-8")  print json 

Below is the result, look the json.dumps convert the utf-8 to the original numbers!

{"e": "a", "d": "\u4e2d"} 

Why this is broken? Or anything I am wrong?

like image 675
Bin Chen Avatar asked Nov 15 '10 12:11

Bin Chen


People also ask

Can JSON have UTF-8?

The JSON spec requires UTF-8 support by decoders. As a result, all JSON decoders can handle UTF-8 just as well as they can handle the numeric escape sequences. This is also the case for Javascript interpreters, which means JSONP will handle the UTF-8 encoded JSON as well.

What is the difference between JSON dump and JSON dumps?

json. dump() method used to write Python serialized object as JSON formatted data into a file. json. dumps() method is used to encodes any Python object into JSON formatted String.

Is UTF-8 and ASCII same?

For characters represented by the 7-bit ASCII character codes, the UTF-8 representation is exactly equivalent to ASCII, allowing transparent round trip migration. Other Unicode characters are represented in UTF-8 by sequences of up to 6 bytes, though most Western European characters require only 2 bytes3.


2 Answers

Looks like valid JSON to me. If you want json to output a string that has non-ASCII characters in it then you need to pass ensure_ascii=False and then encode manually afterward.

like image 183
Ignacio Vazquez-Abrams Avatar answered Oct 01 '22 16:10

Ignacio Vazquez-Abrams


You should read json.org. The complete JSON specification is in the white box on the right.

There is nothing wrong with the generated JSON. Generators are allowed to genereate either UTF-8 strings or plain ASCII strings, where characters are escaped with the \uXXXX notation. In your case, the Python json module decided for escaping, and has the escaped notation \u4e2d.

By the way: Any conforming JSON interpreter will correctly unescape this sequence again and give you back the actual character.

like image 39
Boldewyn Avatar answered Oct 01 '22 15:10

Boldewyn