When using <code>json.dumps</code> the default for <code>ensure_ascii</code> is <code>True</code> but I see myself continuously setting it to <code>False</code> as: <ul> <li>If I work with <code>unicode</code> I need to pass it or I'll get str back</li> <li>If I work with <code>str</code> I need to pass it so my chars don't get converted to unicode (encoded within a str)</li> </ul> In which scenarios would you want it to be <code>True</code>? What is the usecase for that option? From the Docs: <blockquote> If ensure_ascii is true (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only. </blockquote> What is the benefit of it?

Writing up thanks to @user2357112 First thing is to understand there is no binary representation in JSON. Therefore all strings should be valid unicode points. If you are trying to json.dumps raw bytes you might be doing something wrong. Then check: <ul> <li>json docs</li> <li>Some information about why ensure_ascii works as it works: issue13769</li> <li> <code>ensure_ascii</code> does two things. Ensuring your output is valid ascii characters (even if they have unicode inside) and allow the function to return an unicode object. </li> </ul> Which makes me assume that: <ul> <li>When you are encoding text into json and all your strings are in unicode it is fine to use <code>ensure_ascii=False</code>, but it might actually make more sense to leave it to true and decode the str. (As per specification dumps doesnt guarantee unicode back, though it does return it if you pass unicode.</li> <li>If you are working with str objects, calling ensure_ascii=False will prevent json from transforming your chars to unicode. You might think you want that but if you then try to read those in the browser for example weird things might happen</li> </ul> About how ensure_ascii impacts the result, this is a table that might help. <pre class="prettyprint"><code>+-----------------------+--------------+------------------------------+ | Input | Ensure_ascii | output | +-----------------------+--------------+------------------------------+ | u”汉语” | True | '"\\u6c49\\u8bed"' | | u”汉语” | False | u'"\u6c49\u8bed"' | | u”汉语".encode("utf-8")| True | '"\\u6c49\\u8bed"’ | | u”汉语".encode("utf-8")| False | '"\xe6\xb1\x89\xe8\xaf\xad"' | +-----------------------+--------------+------------------------------+ </code></pre> Note the last value is utf-8 encoded unicode into bytes. Which might be not parseable by other json decoders. Moreover If you mix types(Array of unicode and str) and use <code>ensure_ascii=False</code> you can get an <code>UnicodeDecodeErrror</code> (When encoding into json, mindblending) as the module will to return you a unicode object but it wont be able to convert the str into unicode using the default encoding (ascii)

Using json.dumps with ensure_ascii=True

1 Answers

Writing up thanks to @user2357112

First thing is to understand there is no binary representation in JSON. Therefore all strings should be valid unicode points. If you are trying to json.dumps raw bytes you might be doing something wrong.

Then check:

json docs
Some information about why ensure_ascii works as it works: issue13769
ensure_ascii does two things. Ensuring your output is valid ascii characters (even if they have unicode inside) and allow the function to return an unicode object.

Which makes me assume that:

When you are encoding text into json and all your strings are in unicode it is fine to use ensure_ascii=False, but it might actually make more sense to leave it to true and decode the str. (As per specification dumps doesnt guarantee unicode back, though it does return it if you pass unicode.
If you are working with str objects, calling ensure_ascii=False will prevent json from transforming your chars to unicode. You might think you want that but if you then try to read those in the browser for example weird things might happen

About how ensure_ascii impacts the result, this is a table that might help.

+-----------------------+--------------+------------------------------+
|         Input         | Ensure_ascii |            output            |
+-----------------------+--------------+------------------------------+
| u”汉语”                | True         | '"\\u6c49\\u8bed"'           |
| u”汉语”                | False        | u'"\u6c49\u8bed"'            |
| u”汉语".encode("utf-8")| True         | '"\\u6c49\\u8bed"’           |
| u”汉语".encode("utf-8")| False        | '"\xe6\xb1\x89\xe8\xaf\xad"' |
+-----------------------+--------------+------------------------------+

Note the last value is utf-8 encoded unicode into bytes. Which might be not parseable by other json decoders.

Moreover If you mix types(Array of unicode and str) and use ensure_ascii=False you can get an UnicodeDecodeErrror (When encoding into json, mindblending) as the module will to return you a unicode object but it wont be able to convert the str into unicode using the default encoding (ascii)

110

answered Sep 25 '22 08:09

Mario Corchero

Related questions
                            
                                Python - list comprehension in this case is efficient?
                            
                                /usr/local/bin/python: No module named pip
                            
                                Bulk Partial Upsert in Elasticseach with python
                            
                                Django query expression for calculated fields that require conditions and casting
                            
                                Numpy: Check if float array contains whole numbers
                            
                                Django ORM - confusion about Router.allow_relation()
                            
                                Purpose of pool.join, pool.close in multiprocessing?
                            
                                Multiple pipelines that merge within a sklearn Pipeline?
                            
                                How to use Python to read one column from Excel file?
                            
                                Drawing phase space trajectories with arrows in matplotlib
                            
                                How do I set label for an already plotted line in matplotlib?
                            
                                How can I get an oauth2 access_token using Python
                            
                                multithreading for data from dataframe pandas
                            
                                Pandas, DataFrame: Splitting one column into multiple columns
                            
                                Adding New Text to Sklearn TFIDIF Vectorizer (Python)
                            
                                How to extend the logging.Logger Class?
                            
                                What are some ways to post python pandas dataframes to slack?
                            
                                Select rows of DataFrame with datetime index based on date
                            
                                how can i fix AttributeError: 'dict_values' object has no attribute 'count'?
                            
                                How to fix "TypeError: len() of unsized object"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using json.dumps with ensure_ascii=True

Tags:

python

json

unicode

python-2.7

Mario Corchero

People also ask

1 Answers

Mario Corchero

Recent Activity

Donate For Us