I decided to use Python 3 for making my website, but I encountered a problem with Unicode output.
It seems like plain print(html) #html is a
str
should be working, but it's not. I get UnicodeEncodeError: 'ascii' codec can't encode characters[...]: ordinal not in range(128)
. This must be because the webserver doesn't support unicode output.
The next thing I tried was print(html.encode('utf-8'))
, but I got something like repr output of the byte string: it is placed inside b'...'
and all the escape characters are in raw form (e.g. \n
and \xd0\x9c
)
Please show me the correct way to output a Unicode (str) string as a raw UTF-8 encoded bytes string in Python 3.1
The problem here is that you stdout isn't attached to an actual terminal and will use the ASCII encoding by default. Therefore you need to write to sys.stdout.buffer, which is the "raw" binary output of sys.stdout. This can be done in various ways, the most common one seems to be:
import codecs, sys
writer = codecs.getwriter('utf8')(sys.stdout.buffer)
And the use writer. In a CGI script you may be able to replace sys.stdout with the writer so:
sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
Might actually work so you can print normally. Try that!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With