Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python encoding error only when called as external process

A simple file like

$ cat x.py
x = u'Gen\xe8ve'
print x

when run will give me:

$ python x.py
Genève

however, when run as a "command substitution" will give:

$ echo $(python x.py)
...
UnicodeEncodeError: 'ascii' codec...

I've tried with different terminal emulators (xterm, gnome-term) and the console on a ttyS. With bash and sh. With python2.4 and 2.7. I've tried setting the LC_ALL or LANG to some utf-8 locale before running python. I've checked the sys.getdefaultencoding(). And nothing helped.

The problem arises also when the script is called from another process (like java), but the above was the easiest way I found to replicate it.

I don't understand what's the difference between the two calls. Can anyone help?

like image 309
Matteo Gamboz Avatar asked Aug 07 '12 11:08

Matteo Gamboz


People also ask

How do I fix Unicode encode errors in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

What is encoding utf-8 in Python?

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the '8' means that 8-bit values are used in the encoding.

What is the default encoding in python3?

By default in Python 3, we are on the left side in the world of Unicode code points for strings. We only need to go back and forth with bytes while writing or reading the data. Default encoding during this conversion is UTF-8, but other encodings can also be used.


2 Answers

The problem here is that in the second call you are basically writing to a pipe that only accepts bytestrings (file-like object). The same happens if you try to execute this:

python x.py > my_file
Traceback (most recent call last):
File "x.py", line 2, in <module>
    print x
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 3: ordinal not in range(128)

As the receiver only understands bytestrings and not unicode characters you must first encode the unicode string into a bytestring using the encode function:

x = u'Gen\xe8ve'.encode('utf-8') 
print x

This will print the the unicode string encoded as a utf-8 bytestring (a sequence of bytes), allowing it to be written to a file-like object.

$echo $(python x.py)
Genève
$python x.py 
Genève
like image 129
Santiago Alessandri Avatar answered Sep 20 '22 11:09

Santiago Alessandri


As you suspect, Python doesn't know how to print unicode when its standard output is not a known terminal. Consider encoding the string before printing it:

# coding: utf-8
x = u'Gen\xe8ve'
print x.encode("utf-8")

Note that the invoking program and your script will need to agree in a common encoding.

like image 40
dsign Avatar answered Sep 21 '22 11:09

dsign