Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using json.dumps with ensure_ascii=True

When using json.dumps the default for ensure_ascii is True but I see myself continuously setting it to False as:

  • If I work with unicode I need to pass it or I'll get str back
  • If I work with str I need to pass it so my chars don't get converted to unicode (encoded within a str)

In which scenarios would you want it to be True? What is the usecase for that option?

From the Docs:

If ensure_ascii is true (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only.

What is the benefit of it?

like image 482
Mario Corchero Avatar asked Nov 03 '16 23:11

Mario Corchero


People also ask

What's the difference between JSON dump and JSON dumps?

json. dump() method used to write Python serialized object as JSON formatted data into a file. json. dumps() method is used to encodes any Python object into JSON formatted String.

What is JSON dumps used for?

The dumps() is used when the objects are required to be in string format and is used for parsing, printing, etc, . The dump() needs the json file name in which the output has to be stored as an argument. The dumps() does not require any such file name to be passed.

Does JSON dumps return a string?

dumps() takes in a json object and returns a string.

What is the type of JSON dumps?

dumps() in Python. The full-form of JSON is JavaScript Object Notation.


1 Answers

Writing up thanks to @user2357112

First thing is to understand there is no binary representation in JSON. Therefore all strings should be valid unicode points. If you are trying to json.dumps raw bytes you might be doing something wrong.

Then check:

  • json docs
  • Some information about why ensure_ascii works as it works: issue13769
  • ensure_ascii does two things. Ensuring your output is valid ascii characters (even if they have unicode inside) and allow the function to return an unicode object.

Which makes me assume that:

  • When you are encoding text into json and all your strings are in unicode it is fine to use ensure_ascii=False, but it might actually make more sense to leave it to true and decode the str. (As per specification dumps doesnt guarantee unicode back, though it does return it if you pass unicode.
  • If you are working with str objects, calling ensure_ascii=False will prevent json from transforming your chars to unicode. You might think you want that but if you then try to read those in the browser for example weird things might happen

About how ensure_ascii impacts the result, this is a table that might help.

+-----------------------+--------------+------------------------------+
|         Input         | Ensure_ascii |            output            |
+-----------------------+--------------+------------------------------+
| u”汉语”                | True         | '"\\u6c49\\u8bed"'           |
| u”汉语”                | False        | u'"\u6c49\u8bed"'            |
| u”汉语".encode("utf-8")| True         | '"\\u6c49\\u8bed"’           |
| u”汉语".encode("utf-8")| False        | '"\xe6\xb1\x89\xe8\xaf\xad"' |
+-----------------------+--------------+------------------------------+

Note the last value is utf-8 encoded unicode into bytes. Which might be not parseable by other json decoders.

Moreover If you mix types(Array of unicode and str) and use ensure_ascii=False you can get an UnicodeDecodeErrror (When encoding into json, mindblending) as the module will to return you a unicode object but it wont be able to convert the str into unicode using the default encoding (ascii)

like image 110
Mario Corchero Avatar answered Sep 25 '22 08:09

Mario Corchero