If run this code in console - it works well (it is in Russian), but if run it like cgi on Apache2 server - it fails: <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode characters in position 8-9: ordinal not in range(128)
. The code is:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import cgitb
cgitb.enable()
print "Content-Type: text/html;charset=utf-8"
print
s=u'Nikolja \u043d\u0435 \u0421\u0430\u0440\u043a\u043e\u0437\u0438!'
print s#.encode('utf-8')
Yes, solution is to uncomment .encode('utf-8')
, but i spend more time to understand why than happens and i cant see the answer.
When running from the console Python can detect the encoding of the console and implicitly converts Unicode printed to the console to that encoding. It can still fail if that encoding doesn't support the characters you are trying to print. UTF-8 can support all Unicode characters, but other common console encodings like cp437 on US Windows don't.
When stdout is not a console, Python 2.X defaults to ASCII when it can't determine a console encoding. That's why in a web sever you have to be explicit and encode your output yourself.
As an example, try the following script from a console and from your webserver:
import sys
print sys.stdout.encoding
From the console you should get some encoding, but from the web server you should get None
. Note that Python 2.X uses ascii
but Python 3.X uses utf-8
when the encoding cannot be determined.
The problem can also occur at a console when redirecting output. This script:
import sys
print >>sys.stderr,sys.stdout.encoding
print >>sys.stderr,sys.stderr.encoding
returns the following when run directly vs. redirecting stdout
:
C:\>test
cp437
cp437
C:\>test >out.txt
None
cp437
Note stderr
wasn't affected since it wasn't redirected.
The environment variable PYTHONIOENCODING
can be used to override the default stdout/stdin encoding as well.
Try applying the utf-8 codecs on stdin and stdout...
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import cgitb
import sys
import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
# If you need input too, read from char_stream as you would sys.stdin
char_stream = codecs.getreader('utf-8')(sys.stdin)
cgitb.enable()
print "Content-Type: text/html;charset=utf-8"
print
s=u'Nikolja \u043d\u0435 \u0421\u0430\u0440\u043a\u043e\u0437\u0438!'
print s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With