Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing unicode strings via sys.stdout in Python

Assume for a moment that one cannot use print (and thus enjoy the benefit of automatic encoding detection). So that leaves us with sys.stdout. However, sys.stdout is so dumb as to not do any sensible encoding.

Now one reads the Python wiki page PrintFails and goes to try out the following code:

$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
  sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout);

However this too does not work (at least on Mac). Too see why:

>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> sys.stdout.encoding
'UTF-8'

(UTF-8 is what one's terminal understands).

So one changes the above code to:

$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
  sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout);

And now unicode strings are properly sent to sys.stdout and hence printed properly on the terminal (sys.stdout is attached the terminal).

Is this the correct way to write unicode strings in sys.stdout or should I be doing something else?

EDIT: at times--say, when piping the output to less--sys.stdout.encoding will be None. in this case, the above code will fail.

like image 731
Sridhar Ratnakumar Avatar asked Sep 24 '09 19:09

Sridhar Ratnakumar


People also ask

What is SYS stdout write in Python?

stdout. A built-in file object that is analogous to the interpreter's standard output stream in Python. stdout is used to display output directly to the screen console.

How do I run Unicode in Python?

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.

Are Python strings Unicode?

Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters.


2 Answers

export PYTHONIOENCODING=utf-8

will do the job, but can't set it on python itself ...

what we can do is verify if isn't setting and tell the user to set it before call script with :

if __name__ == '__main__':
    if (sys.stdout.encoding is None):
        print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout."
        exit(1)
like image 80
Sérgio Avatar answered Sep 20 '22 18:09

Sérgio


Best idea is to check if you are directly connected to a terminal. If you are, use the terminal's encoding. Otherwise, use system preferred encoding.

if sys.stdout.isatty():
    default_encoding = sys.stdout.encoding
else:
    default_encoding = locale.getpreferredencoding()

It's also very important to always allow the user specify whichever encoding she wants. Usually I make it a command-line option (like -e ENCODING), and parse it with the optparse module.

Another good thing is to not overwrite sys.stdout with an automatic encoder. Create your encoder and use it, but leave sys.stdout alone. You could import 3rd party libraries that write encoded bytestrings directly to sys.stdout.

like image 41
nosklo Avatar answered Sep 19 '22 18:09

nosklo