Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python subprocess echo a unicode literal

I'm aware that questions like this have been asked before. But I'm not finding a solution.

I want to use a unicode literal, defined in my python file, with the subprocess module. But I'm not getting the results that I need. For example the following code

# -*- coding: utf-8 -*-
import sys
import codecs
import subprocess
cmd = ['echo', u'你好']
new_cmd = []
for c in cmd:
    if isinstance(c,unicode):
        c = c.encode('utf-8')
    new_cmd.append(c)
subprocess.call(new_cmd)

prints out

你好

If I change the code to

# -*- coding: utf-8 -*-
import sys
import codecs
import subprocess
cmd = ['echo', u'你好']
new_cmd = []
for c in cmd:
    if isinstance(c,unicode):
        c = c.encode(sys.getfilesystemencoding())
    new_cmd.append(c)
subprocess.call(new_cmd)

I get the following

??

At this stage I can only assume I'm, repeatedly, making a simple mistake. But I'm having a hard time figuring out what it is. How can I get echo to print out the following when invoked via python's subprocess

你好

Edit:

The version of Python is 2.7. I'm running on Windows 8 but I'd like the solution to be platform independent.

like image 767
Shane Gannon Avatar asked May 05 '15 13:05

Shane Gannon


1 Answers

Your first try was the best.

You actually converted the 2 unicode characters u'你好' (or u'\u4f60\u597d') in UTF8 all that giving b'\xe4\xbd\xa0\xe5\xa5\xbd'.

You can control it in IDLE that fully support unicode and where b'\xe4\xbd\xa0\xe5\xa5\xbd'.decode('utf-8') gives back 你好. Another way to control it is to redirect script output to a file and open it with an UTF-8 compatible editor : there again you will see what you want.

But the problem is that Windows console does not support full unicode. It depends on :

  • the code page installed - I do not know for Windows 8 but previous versions had poor support for unicode and could display only 256 characters
  • the font used in the console - not all fonts have glyphs for all characters.

If you know a code page that contains glyphs for your characters (I don't), you can try to insert it in a console with chcp and explicitely encode your unicode string to that. But on my french machine, I do not know how to do ... except by passing by a text file !

As you spoke of ConEmu, I did it a try ... and it works fine with it, with python 3.4 !

chcp 65001
py -3
import subprocess
cmd = ['cmd', '/c', 'echo', u'\u4f60\u597d']
subprocess.call(cmd)

gives :

你好  
0

The problem is only in the cmd.exe windows !

like image 121
Serge Ballesta Avatar answered Sep 18 '22 15:09

Serge Ballesta