Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why python-cgi fails on unicode?

If run this code in console - it works well (it is in Russian), but if run it like cgi on Apache2 server - it fails: <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode characters in position 8-9: ordinal not in range(128). The code is:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import cgitb
cgitb.enable()

print "Content-Type: text/html;charset=utf-8"
print 
s=u'Nikolja \u043d\u0435 \u0421\u0430\u0440\u043a\u043e\u0437\u0438!'
print s#.encode('utf-8')

Yes, solution is to uncomment .encode('utf-8'), but i spend more time to understand why than happens and i cant see the answer.

like image 958
scythargon Avatar asked Aug 01 '12 17:08

scythargon


2 Answers

When running from the console Python can detect the encoding of the console and implicitly converts Unicode printed to the console to that encoding. It can still fail if that encoding doesn't support the characters you are trying to print. UTF-8 can support all Unicode characters, but other common console encodings like cp437 on US Windows don't.

When stdout is not a console, Python 2.X defaults to ASCII when it can't determine a console encoding. That's why in a web sever you have to be explicit and encode your output yourself.

As an example, try the following script from a console and from your webserver:

import sys
print sys.stdout.encoding

From the console you should get some encoding, but from the web server you should get None. Note that Python 2.X uses ascii but Python 3.X uses utf-8 when the encoding cannot be determined.

The problem can also occur at a console when redirecting output. This script:

import sys
print >>sys.stderr,sys.stdout.encoding
print >>sys.stderr,sys.stderr.encoding

returns the following when run directly vs. redirecting stdout:

C:\>test
cp437
cp437

C:\>test >out.txt
None
cp437

Note stderr wasn't affected since it wasn't redirected.

The environment variable PYTHONIOENCODING can be used to override the default stdout/stdin encoding as well.

like image 74
Mark Tolonen Avatar answered Oct 30 '22 19:10

Mark Tolonen


Try applying the utf-8 codecs on stdin and stdout...

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import cgitb
import sys
import codecs

sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
# If you need input too, read from char_stream as you would sys.stdin
char_stream = codecs.getreader('utf-8')(sys.stdin)

cgitb.enable()

print "Content-Type: text/html;charset=utf-8"
print 
s=u'Nikolja \u043d\u0435 \u0421\u0430\u0440\u043a\u043e\u0437\u0438!'
print s
like image 25
DrSkippy Avatar answered Oct 30 '22 20:10

DrSkippy