Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 2.7 JSON dump UnicodeEncodeError

I have a file where each line is a json object like so:

{"name": "John", ...}

{...}

I am trying to create a new file with the same objects, but with certain properties removed from all of them.

When I do this, I get a UnicodeEncodeError. Strangely, If I instead loop over range(n) (for some number n) and use infile.next(), it works just as I want it to.

Why so? How do I get this to work by iterating over infile? I tried using dumps() instead of dump(), but that just makes a bunch of empty lines in the outfile.

with open(filename, 'r') as infile:
    with open('_{}'.format(filename), 'w') as outfile:
        for comment in infile:
            decodedComment = json.loads(comment)
            for prop in propsToRemove:
                # use pop to avoid exception handling
                decodedComment.pop(prop, None)
            json.dump(decodedComment, outfile, ensure_ascii = False)
            outfile.write('\n')

Here is the error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f47d' in position 1: ordinal not in range(128)

Thanks for the help!

like image 786
Dimitrios Avatar asked Mar 03 '15 05:03

Dimitrios


People also ask

How do you dump to JSON in Python?

Another way of writing JSON to a file is by using json. dump() method The JSON package has the “dump” function which directly writes the dictionary to a file in the form of JSON, without needing to convert it into an actual JSON object.

What is JSON dumps () method?

The dump() method is used when the Python objects have to be stored in a file. The dumps() is used when the objects are required to be in string format and is used for parsing, printing, etc, . The dump() needs the json file name in which the output has to be stored as an argument.

Does JSON dumps convert to string?

dumps() json. dumps() function will convert a subset of Python objects into a json string.

What is the difference between JSON dump and JSON dumps?

dump() method used to write Python serialized object as JSON formatted data into a file. json. dumps() method is used to encodes any Python object into JSON formatted String.


1 Answers

The problem you are facing is that the standard file.write() function (called by the json.dump() function) does not support unicode strings. From the error message, it turns out that your string contains the UTF character \U0001f47d (which turns out to code for the character EXTRATERRESTRIAL ALIEN, who knew?), and possibly other UTF characters. To handle these characters, either you can encode them into an ASCII encoding (they'll show up in your output file as \XXXXXX), or you need to use a file writer that can handle unicode.

To do the first option, replace your writing line with this line:

json.dump(unicode(decodedComment), outfile, ensure_ascii = False)

The second option is likely more what you want, and an easy option is to use the codecs module. Import it, and change your second line to:

with codecs.open('_{}'.format(filename), 'w', encoding="utf-8") as outfile:

Then, you'll be able to save the special characters in their original form.

like image 104
zplizzi Avatar answered Oct 03 '22 08:10

zplizzi