I have a json file which happens to have a multitude of Chinese and Japanese (and other language) characters. I'm loading it into my python 2.7 script using <code>io.open</code> as follows: <pre class="prettyprint"><code>with io.open('multiIdName.json', encoding="utf-8") as json_data: cards = json.load(json_data) </code></pre> I add a new property to the json, all good. Then I attempt to write it back out to another file: <pre class="prettyprint"><code>with io.open("testJson.json",'w',encoding="utf-8") as outfile: json.dump(cards, outfile, ensure_ascii=False) </code></pre> That's when I get the error <code>TypeError: must be unicode, not str</code> I tried writing the outfile as a binary (<code>with io.open("testJson.json",'wb') as outfile:</code>), but I end up with stuff this: <pre class="prettyprint"><code>{"multiverseid": 262906, "name": "\u00e6\u00b8\u00b8\u00e9\u009a\u00bc\u00e7\u008b\u00ae\u00e9\u00b9\u00ab", "language": "Chinese Simplified"} </code></pre> I thought opening and writing it in the same encoding would be enough, as well as the ensure_ascii flag, but clearly not. I just want to preserve the characters that existed in the file before I run my script, without them turning into \u's.

The reason for this error is the completely stupid behaviour of <code>json.dumps</code> in Python 2: <pre class="prettyprint"><code>>>> json.dumps({'a': 'a'}, ensure_ascii=False) '{"a": "a"}' >>> json.dumps({'a': u'a'}, ensure_ascii=False) u'{"a": "a"}' >>> json.dumps({'a': 'ä'}, ensure_ascii=False) '{"a": "\xc3\xa4"}' >>> json.dumps({u'a': 'ä'}, ensure_ascii=False) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps sort_keys=sort_keys, **kw).encode(obj) File "/usr/lib/python2.7/json/encoder.py", line 210, in encode return ''.join(chunks) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128) </code></pre> This coupled with the fact that <code>io.open</code> with <code>encoding</code> set only accepts <code>unicode</code> objects (which by itself is right), leads to problems. The return type is completely dependent on whatever is the type of keys or values in the dictionary, if <code>ensure_ascii=False</code>, but <code>str</code> is returned always if <code>ensure_ascii=True</code>. If you can accidentally set 8-bit strings to dictionaries, you cannot blindly convert this return type to <code>unicode</code>, because you need to set the encoding, presumably UTF-8: <pre class="prettyprint"><code>>>> x = json.dumps(obj, ensure_ascii=False) >>> if isinstance(x, str): ... x = unicode(x, 'UTF-8') </code></pre> In this case I believe you can use the <code>json.dump</code> to write to an open binary file; however if you need to do something more complicated with the resulting object, you probably need the above code. <hr> One solution is to end all this encoding/decoding madness by switching to Python 3.

The JSON module handles encoding and decoding for you, so you can simply open the input and output files in binary mode. The JSON module assumes UTF-8 encoding, but can be changed using <code>encoding</code> attribute on the <code>load()</code> and <code>dump()</code> methods. <pre class="prettyprint"><code>with open('multiIdName.json', 'rb') as json_data: cards = json.load(json_data) </code></pre> then: <strike></strike> <pre class="prettyprint"><code>with open("testJson.json", 'wb') as outfile: json.dump(cards, outfile, ensure_ascii=False) </code></pre> Thanks to @Antti Haapala, Python 2.x JSON module gives either Unicode or str depending on the contents of the object. You will have to add a sense check to ensure the result is a Unicode before writing through <code>io</code>: <pre class="prettyprint"><code>with io.open("testJson.json", 'w', encoding="utf-8") as outfile: my_json_str = json.dumps(my_obj, ensure_ascii=False) if isinstance(my_json_str, str): my_json_str = my_json_str.decode("utf-8") outfile.write(my_json_str) </code></pre>

Json.dump failing with 'must be unicode, not str' TypeError

Tags:

python

json

encoding

unicode

python-2.7

I have a json file which happens to have a multitude of Chinese and Japanese (and other language) characters. I'm loading it into my python 2.7 script using io.open as follows:

with io.open('multiIdName.json', encoding="utf-8") as json_data:
    cards = json.load(json_data)

I add a new property to the json, all good. Then I attempt to write it back out to another file:

with io.open("testJson.json",'w',encoding="utf-8") as outfile:
        json.dump(cards, outfile, ensure_ascii=False)

That's when I get the error TypeError: must be unicode, not str

I tried writing the outfile as a binary (with io.open("testJson.json",'wb') as outfile:), but I end up with stuff this:

{"multiverseid": 262906, "name": "\u00e6\u00b8\u00b8\u00e9\u009a\u00bc\u00e7\u008b\u00ae\u00e9\u00b9\u00ab", "language": "Chinese Simplified"}

I thought opening and writing it in the same encoding would be enough, as well as the ensure_ascii flag, but clearly not. I just want to preserve the characters that existed in the file before I run my script, without them turning into \u's.

937

asked Mar 15 '16 05:03

IronWaffleMan

3 Answers

Can you try the following?

with io.open("testJson.json",'w',encoding="utf-8") as outfile:
  outfile.write(unicode(json.dumps(cards, ensure_ascii=False)))

106

answered Oct 14 '22 07:10

Yaron

The reason for this error is the completely stupid behaviour of json.dumps in Python 2:

>>> json.dumps({'a': 'a'}, ensure_ascii=False)
'{"a": "a"}'
>>> json.dumps({'a': u'a'}, ensure_ascii=False)
u'{"a": "a"}'
>>> json.dumps({'a': 'ä'}, ensure_ascii=False)
'{"a": "\xc3\xa4"}'
>>> json.dumps({u'a': 'ä'}, ensure_ascii=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 210, in encode
    return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

This coupled with the fact that io.open with encoding set only accepts unicode objects (which by itself is right), leads to problems.

The return type is completely dependent on whatever is the type of keys or values in the dictionary, if ensure_ascii=False, but str is returned always if ensure_ascii=True. If you can accidentally set 8-bit strings to dictionaries, you cannot blindly convert this return type to unicode, because you need to set the encoding, presumably UTF-8:

>>> x = json.dumps(obj, ensure_ascii=False)
>>> if isinstance(x, str):
...     x = unicode(x, 'UTF-8')

In this case I believe you can use the json.dump to write to an open binary file; however if you need to do something more complicated with the resulting object, you probably need the above code.

One solution is to end all this encoding/decoding madness by switching to Python 3.

answered Oct 14 '22 07:10

Antti Haapala -- Слава Україні

The JSON module handles encoding and decoding for you, so you can simply open the input and output files in binary mode. The JSON module assumes UTF-8 encoding, but can be changed using encoding attribute on the load() and dump() methods.

with open('multiIdName.json', 'rb') as json_data:
    cards = json.load(json_data)

then:

with open("testJson.json", 'wb') as outfile:
    json.dump(cards, outfile, ensure_ascii=False)

Thanks to @Antti Haapala, Python 2.x JSON module gives either Unicode or str depending on the contents of the object.

You will have to add a sense check to ensure the result is a Unicode before writing through io:

with io.open("testJson.json", 'w', encoding="utf-8") as outfile:
    my_json_str = json.dumps(my_obj, ensure_ascii=False)
    if isinstance(my_json_str, str):
        my_json_str = my_json_str.decode("utf-8")

    outfile.write(my_json_str)

answered Oct 14 '22 07:10

Alastair McCormack

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Json.dump failing with 'must be unicode, not str' TypeError

Tags:

python

json

encoding

unicode

python-2.7

IronWaffleMan

People also ask

3 Answers

Yaron

Antti Haapala -- Слава Україні

Alastair McCormack

Recent Activity

Donate For Us

Json.dump failing with 'must be unicode, not str' TypeError

Tags:

python

json

encoding

unicode

python-2.7

IronWaffleMan

People also ask

3 Answers

Yaron

Antti Haapala -- Слава Україні

Alastair McCormack

Related questions

Recent Activity

Donate For Us