I have a file where each line is a json object like so: <pre class="prettyprint"><code>{"name": "John", ...} {...} </code></pre> I am trying to create a new file with the same objects, but with certain properties removed from all of them. When I do this, I get a UnicodeEncodeError. Strangely, If I instead loop over <code>range(n)</code> (for some number n) and use <code>infile.next()</code>, it works just as I want it to. Why so? How do I get this to work by iterating over <code>infile</code>? I tried using <code>dumps()</code> instead of <code>dump()</code>, but that just makes a bunch of empty lines in the <code>outfile</code>. <pre class="prettyprint"><code>with open(filename, 'r') as infile: with open('_{}'.format(filename), 'w') as outfile: for comment in infile: decodedComment = json.loads(comment) for prop in propsToRemove: # use pop to avoid exception handling decodedComment.pop(prop, None) json.dump(decodedComment, outfile, ensure_ascii = False) outfile.write('\n') </code></pre> Here is the error: <pre class="prettyprint"><code>UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f47d' in position 1: ordinal not in range(128) </code></pre> Thanks for the help!

The problem you are facing is that the standard <code>file.write()</code> function (called by the <code>json.dump()</code> function) does not support unicode strings. From the error message, it turns out that your string contains the UTF character <code>\U0001f47d</code> (which turns out to code for the character EXTRATERRESTRIAL ALIEN, who knew?), and possibly other UTF characters. To handle these characters, either you can encode them into an ASCII encoding (they'll show up in your output file as <code>\XXXXXX</code>), or you need to use a file writer that can handle unicode. To do the first option, replace your writing line with this line: <pre class="prettyprint"><code>json.dump(unicode(decodedComment), outfile, ensure_ascii = False) </code></pre> The second option is likely more what you want, and an easy option is to use the <code>codecs</code> module. Import it, and change your second line to: <pre class="prettyprint"><code>with codecs.open('_{}'.format(filename), 'w', encoding="utf-8") as outfile: </code></pre> Then, you'll be able to save the special characters in their original form.

Python 2.7 JSON dump UnicodeEncodeError

Tags:

python

json

python-2.7

dump

I have a file where each line is a json object like so:

{"name": "John", ...}

{...}

I am trying to create a new file with the same objects, but with certain properties removed from all of them.

When I do this, I get a UnicodeEncodeError. Strangely, If I instead loop over range(n) (for some number n) and use infile.next(), it works just as I want it to.

Why so? How do I get this to work by iterating over infile? I tried using dumps() instead of dump(), but that just makes a bunch of empty lines in the outfile.

with open(filename, 'r') as infile:
    with open('_{}'.format(filename), 'w') as outfile:
        for comment in infile:
            decodedComment = json.loads(comment)
            for prop in propsToRemove:
                # use pop to avoid exception handling
                decodedComment.pop(prop, None)
            json.dump(decodedComment, outfile, ensure_ascii = False)
            outfile.write('\n')

Here is the error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f47d' in position 1: ordinal not in range(128)

Thanks for the help!

786

asked Mar 03 '15 05:03

Dimitrios

1 Answers

The problem you are facing is that the standard file.write() function (called by the json.dump() function) does not support unicode strings. From the error message, it turns out that your string contains the UTF character \U0001f47d (which turns out to code for the character EXTRATERRESTRIAL ALIEN, who knew?), and possibly other UTF characters. To handle these characters, either you can encode them into an ASCII encoding (they'll show up in your output file as \XXXXXX), or you need to use a file writer that can handle unicode.

To do the first option, replace your writing line with this line:

json.dump(unicode(decodedComment), outfile, ensure_ascii = False)

The second option is likely more what you want, and an easy option is to use the codecs module. Import it, and change your second line to:

with codecs.open('_{}'.format(filename), 'w', encoding="utf-8") as outfile:

Then, you'll be able to save the special characters in their original form.

104

answered Oct 03 '22 08:10

zplizzi

Related questions
                            
                                Paramiko: Module object has no attribute error 'SSHClient'
                            
                                how to use django-admin.py makemessages --all
                            
                                Cannot import name views
                            
                                Saving Hashed Version of User Password in Django Form Not Working
                            
                                Plot linear model in 3d with Matplotlib
                            
                                Median of a list with NaN values removed, in python
                            
                                What does the list() function do in Python?
                            
                                Formatting custom class output in PyYAML
                            
                                Avoid check if logger exists
                            
                                what is the best way to write a combo box in django?
                            
                                python - Return Text Between Parenthesis
                            
                                Django 'RequestContext' is not defined - forms.ModelForm
                            
                                "If...or..." statement inside list comprehension?
                            
                                Rearranging list based on order of another list
                            
                                How to check if input is a natural number in Python?
                            
                                django server code not updating
                            
                                How do I limit the amount of letters in a string
                            
                                Iterparse object has no attribute next
                            
                                what is the difference between class weight = none and auto in svm scikit learn
                            
                                what is the Python equivalent of the Main method in Java, C, C++ & C#?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With