Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting the correct encoding when piping stdout in Python

When piping the output of a Python program, the Python interpreter gets confused about encoding and sets it to None. This means a program like this:

# -*- coding: utf-8 -*- print u"åäö" 

will work fine when run normally, but fail with:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)

when used in a pipe sequence.

What is the best way to make this work when piping? Can I just tell it to use whatever encoding the shell/filesystem/whatever is using?

The suggestions I have seen thus far is to modify your site.py directly, or hardcoding the defaultencoding using this hack:

# -*- coding: utf-8 -*- import sys reload(sys) sys.setdefaultencoding('utf-8') print u"åäö" 

Is there a better way to make piping work?

like image 533
Joakim Lundborg Avatar asked Jan 29 '09 16:01

Joakim Lundborg


People also ask

What does encoding =' UTF-8 do in Python?

UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.

What encoding does Python print use?

There are various encodings present which treat a string differently. The popular encodings being utf-8, ascii, etc. Using the string encode() method, you can convert unicode strings into any encodings supported by Python. By default, Python uses utf-8 encoding.

How do I change the default encoding to UTF-8 in Python?

In other words, comment out the original code line following the 'try' that was making the encoding variable equal to locale. getdefaultlocale (because that will give you cp1252 which you don't want) and instead brute force it to 'utf-8' (by adding the line 'encoding = 'utf-8' as shown).


1 Answers

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself.

A rule of thumb is: Always use Unicode internally. Decode what you receive, and encode what you send.

# -*- coding: utf-8 -*- print u"åäö".encode('utf-8') 

Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8, making everything uppercase in between.

import sys for line in sys.stdin:     # Decode what you receive:     line = line.decode('iso8859-1')      # Work with Unicode internally:     line = line.upper()      # Encode what you send:     line = line.encode('utf-8')     sys.stdout.write(line) 

Setting the system default encoding is a bad idea, because some modules and libraries you use can rely on the fact it is ASCII. Don't do it.

like image 122
nosklo Avatar answered Sep 21 '22 05:09

nosklo