When using json.dumps
the default for ensure_ascii
is True
but I see myself continuously setting it to False
as:
unicode
I need to pass it or I'll get str backstr
I need to pass it so my chars don't get converted to unicode (encoded within a str)In which scenarios would you want it to be True
? What is the usecase for that option?
From the Docs:
If ensure_ascii is true (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only.
What is the benefit of it?
json. dump() method used to write Python serialized object as JSON formatted data into a file. json. dumps() method is used to encodes any Python object into JSON formatted String.
The dumps() is used when the objects are required to be in string format and is used for parsing, printing, etc, . The dump() needs the json file name in which the output has to be stored as an argument. The dumps() does not require any such file name to be passed.
dumps() takes in a json object and returns a string.
dumps() in Python. The full-form of JSON is JavaScript Object Notation.
Writing up thanks to @user2357112
First thing is to understand there is no binary representation in JSON. Therefore all strings should be valid unicode points. If you are trying to json.dumps raw bytes you might be doing something wrong.
Then check:
ensure_ascii
does two things. Ensuring your output is valid ascii characters (even if they have unicode inside) and allow the function to return an unicode object. Which makes me assume that:
ensure_ascii=False
, but it might actually make more sense to leave it to true and decode the str. (As per specification dumps doesnt guarantee unicode back, though it does return it if you pass unicode.About how ensure_ascii impacts the result, this is a table that might help.
+-----------------------+--------------+------------------------------+
| Input | Ensure_ascii | output |
+-----------------------+--------------+------------------------------+
| u”汉语” | True | '"\\u6c49\\u8bed"' |
| u”汉语” | False | u'"\u6c49\u8bed"' |
| u”汉语".encode("utf-8")| True | '"\\u6c49\\u8bed"’ |
| u”汉语".encode("utf-8")| False | '"\xe6\xb1\x89\xe8\xaf\xad"' |
+-----------------------+--------------+------------------------------+
Note the last value is utf-8 encoded unicode into bytes. Which might be not parseable by other json decoders.
Moreover If you mix types(Array of unicode and str) and use ensure_ascii=False
you can get an UnicodeDecodeErrror
(When encoding into json, mindblending) as the module will to return you a unicode object but it wont be able to convert the str into unicode using the default encoding (ascii)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With