Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reproduce UnicodeEncodeError?

I get an error in a production system, which I fail to reproduce in a development environment:

with io.open(file_name, 'wt') as fd:
    fd.write(data)

Exception:

  File "/home/.../foo.py", line 18, in foo
    fd.write(data)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 6400: ordinal not in range(128)

I already tried to but a lot of strange characters into the variable data.

But up to now I was not able to reproduce an UnicodeEncodeError.

What needs to be in data to get an UnicodeEncodeError?

Update

python -c 'import locale; print locale.getpreferredencoding()'
UTF-8

Update2

If I call locale.getpreferredencoding() via shell and via web request, the encoding is "UTF-8".

I updated my exception handling in my code and log the getpreferredencoding() since some days. Now it happened again (up to now I am not able to force or reproduce this), and the encoding is "ANSI_X3.4-1968"!

I have no clue where this encoding gets set ....

This puts my problem into a different direction. Leaving this question useless. My problem is now: Where does the preferred encoding get altered? But this is not part of this question.

A big thank you, for all who

like image 431
guettli Avatar asked Jan 10 '17 11:01

guettli


People also ask

How do I resolve UnicodeEncodeError?

Edit:: So i fixed the unicode error by adding encoding="utf-8" ( as it was mentioned here python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 9629: character maps to <undefined>) (open(filename, 'w',encoding="utf-8" ))and it seems to do the work however in the csv file m getting ...

How do I fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

How do I fix Unicodeescape error in Python?

The Python "SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position" occurs when we have an unescaped backslash character in a path. To solve the error, prefix the path with r to mark it as a raw string, e.g. r'C:\Users\Bob\Desktop\example. txt' .

What is the meaning of UTF-8?

UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.


1 Answers

You are relying on the default encoding for the platform; when that default encoding can't support the Unicode characters you are writing to the file, you get an encoding exception.

From the io.open() documentation:

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used.

For your specific situation, the default returned by locale.getpreferredencoding() is ASCII, so any Unicode character outside the ASCII range would cause this issue, U-0080 and up.

Note that the locale is taken from your environment; if it is ASCII, that typically means the locale is set to the POSIX default locale, C.

Specify the encoding explicitly:

with io.open(file_name, 'wt', encoding='utf8') as fd:
    fd.write(data)

I used UTF-8 as an example; what you pick depends entirely on your use cases and the data you are trying to write out.

like image 70
Martijn Pieters Avatar answered Oct 17 '22 15:10

Martijn Pieters