Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python print works differently on different servers

When I try to print an unicode string on my dev server it works correctly but production server raises exception.

File "/home/user/twistedapp/server.py", line 97, in stringReceived
    print "sent:" + json
File "/usr/lib/python2.6/dist-packages/twisted/python/log.py", line 555, in write
    d = (self.buf + data).split('\n')
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 28: ordinal not in range(128)

Actually it is twisted application and print forwards to log file.

repr() of strings are the same. Locale set to en_US.UTF-8.

Are there any configs I need to check to make it work the same on the both servers?

like image 409
Soid Avatar asked Sep 18 '10 15:09

Soid


2 Answers

printing of Unicode strings relies on sys.stdout (the process's standard output) having a correct .encoding attribute that Python can use to encode the unicode string into a byte string to perform the required printing -- and that setting depends on the way the OS is set up, where standard output is directed to, and so forth.

If there's no such attribute, the default coded ascii is used, and, as you've seen, it often does not provide the desired results;-).

You can check getattr(sys.stdout, 'encoding', None) to see if the encoding is there (if it is, you can just keep your fingers crossed that it's correct... or, maybe, try some heavily platform-specific trick to guess at the correct system encoding to check;-). If it isn't, in general, there's no reliable or cross-platform way to guess what it could be. You could try 'utf8', the universal encoding that works in a lot of cases (surely more than ascii does;-), but it's really a spin of the roulette wheel.

For more reliability, your program should have its own configuration file to tell it what output encoding to use (maybe with 'utf8' just as the default if not otherwise specified).

It's also better, for portability, to perform your own encoding, that is, not

print someunicode

but rather

print someunicode.encode(thecodec)

and actually, if you'd rather have incomplete output than a crash,

print someunicode.encode(thecodec, 'ignore')

(which simply skips non-encodable characters), or, usually better,

print someunicode.encode(thecodec, 'replace')

(which uses question-mark placeholders for non-encodable characters).

like image 122
Alex Martelli Avatar answered Oct 27 '22 17:10

Alex Martelli


Unicode is not supported by Twisted's built-in log observers. See http://twistedmatrix.com/trac/ticket/989 for progress on adding support for this, or to see what you can do to help out.

Until #989 is resolved and the fix is in a Twisted release your application is deployed on, do not log unicode. Only log str.

like image 32
Jean-Paul Calderone Avatar answered Oct 27 '22 17:10

Jean-Paul Calderone