Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stdout encoding in python

Is there a good reason why I shouldn't start all my python programs with this? Is there something special lost when doing exec like this?

#!/usr/bin/python
import os, sys
if sys.stdout.encoding == None:
    os.putenv("PYTHONIOENCODING",'UTF-8')
    os.execv(sys.executable,['python']+sys.argv)
print sys.stdout.encoding

There are 60 questions about PYTHONIOENCODING so I guess it's a common problem, but in case you don't know, this is done because when sys.stdout.encoding == None then you can only print ascii chars, so e.g. print "åäö" will throw an exception..

EDIT This happens to me when stdout is a pipe; python encoding.py|cat will set encoding to None

Another solution is to change the codec of stdout sys.stdout = codecs.getwriter('utf8')(sys.stdout) which I'm guessing is the correct answer dispite the comments on that question.

like image 661
Erik Johansson Avatar asked Apr 01 '13 08:04

Erik Johansson


1 Answers

Yes, there is a good reason not to start all your Python programs like that.

First of all:

sys.stdout.encoding is None if Python doesn't know what encoding the stdout supports. This, in most cases, is because it doesn't really support any encoding at all. In your case it's because the stdout is a file, and not a terminal. But it could be set to None because Python also fails to detect the encoding of the terminal.

Second of all: You set the environment variable and then start a new process with the smae command again. That's pretty ugly.

So, unless you plan to be the only one using your programs, you shouldn't start them like that. But if you do plan to be the only using your program, then go ahead.

More in-depth explanation

A better generic solution under Python 2 is to treat stdout as what it is: An 8-bit interface. And that means that anything you print to to stdout should be 8-bit. You get the error when you are trying to print Unicode data, because print will then try to encode the Unicode data to the encoding of stdout, and if it's None it will assume ASCII, and fail, unless you set PYTHONIOENCODING.

But by printing encoded data, you don't have this problem. The following works perfectly even when the output is piped:

print u'ÅÄÖ'.encode('UTF8')

(However, this will fail Under Python 3, because under Python 3, stdout is no longer 8-bit IO, you are supposed to give it Unicode data, and it will encode by itself. If you give it binary data, it will print the representation. Therefore on Python 3 you don't have this problem in the first place).

like image 116
Lennart Regebro Avatar answered Sep 27 '22 19:09

Lennart Regebro