I want to write a non-ascii character, lets say →
to standard output. The tricky part seems to be that some of the data that I want to concatenate to that string is read from json. Consider the follwing simple json document:
{"foo":"bar"}
I include this because if I just want to print →
then it seems enough to simply write:
print("→")
and it will do the right thing in python2 and python3.
So I want to print the value of foo
together with my non-ascii character →
. The only way I found to do this such that it works in both, python2 and python3 is:
getattr(sys.stdout, 'buffer', sys.stdout).write(data["foo"].encode("utf8")+u"→".encode("utf8"))
or
getattr(sys.stdout, 'buffer', sys.stdout).write((data["foo"]+u"→").encode("utf8"))
It is important to not miss the u
in front of →
because otherwise a UnicodeDecodeError
will be thrown by python2.
Using the print
function like this:
print((data["foo"]+u"→").encode("utf8"), file=(getattr(sys.stdout, 'buffer', sys.stdout)))
doesnt seem to work because python3 will complain TypeError: 'str' does not support the buffer interface
.
Did I find the best way or is there a better option? Can I make the print function work?
The most concise I could come up with is the following, which you may be able to make more concise with a few convenience functions (or even replacing/overriding the print function):
# -*- coding=utf-8 -*-
import codecs
import os
import sys
# if you include the -*- coding line, you can use this
output = 'bar' + u'→'
# otherwise, use this
output = 'bar' + b'\xe2\x86\x92'.decode('utf-8')
if sys.stdout.encoding == 'UTF-8':
print(output)
else:
output += os.linesep
if sys.version_info[0] >= 3:
sys.stdout.buffer.write(bytes(output.encode('utf-8')))
else:
codecs.getwriter('utf-8')(sys.stdout).write(output)
The best option is using the -*- encoding line, which allows you to use the actual character in the file. But if for some reason, you can't use the encoding line, it's still possible to accomplish without it.
This (both with and without the encoding line) works on Linux (Arch) with python 2.7.7 and 3.4.1. It also works if the terminal's encoding is not UTF-8. (On Arch Linux, I just change the encoding by using a different LANG environment variable.)
LANG=zh_CN python test.py
It also sort of works on Windows, which I tried with 2.6, 2.7, 3.3, and 3.4. By sort of, I mean I could get the '→'
character to display only on a mintty terminal. On a cmd terminal, that character would display as '→'
. (There may be something simple I'm missing there.)
If you don't need to print to sys.stdout.buffer
, then the following should print fine to sys.stdout
. I tried it in both Python 2.7 and 3.4, and it seemed to work fine:
# -*- coding=utf-8 -*-
print("bar" + u"→")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With