Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python 3.0, how to make print() output unicode?

I'm working in WinXP 5.1.2600, writing a Python application involving Chinese pinyin, which has involved me in endless Unicode problems. Switching to Python 3.0 has solved many of them. But the print() function for console output is not Unicode-aware for some odd reason. Here's a teeny program.

print('sys.stdout encoding is "' + sys.stdout.encoding + '"')
str1 = 'lüelā'
print(str1)

Output is (changing angle brackets to square brackets for readability):

    sys.stdout encoding is "cp1252"
    Traceback (most recent call last):
      File "TestPrintEncoding.py", line 22, in [module]
        print(str1)
      File "C:\Python30\lib\io.py", line 1491, in write
        b = encoder.encode(s)
      File "C:\Python30\lib\encodings\cp1252.py", line 19, in encode
        return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' 
    in position 4: character maps to [undefined]

Note that ü = \xfc = 252 gives no problem since it's upper ASCII. But ā = \u0101 is beyond 8-bits.

Anyone have an idea how to change the encoding of sys.stdout to 'utf-8'? Bear in mind that Python 3.0 no longer uses the codecs module, if I understand the documentation right.


Apologies, I gave you the program without the preamble. Before the 3 lines given, it starts like this:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys

Unfortunately, the coding specified by the "coding:" line is the coding of the source code, not of the console output. But thank you for your thoughts!

like image 730
bigturtle Avatar asked Feb 03 '09 13:02

bigturtle


People also ask

Can you print Unicode in Python?

To print any character in the Python interpreter, use a \u to denote a unicode character and then follow with the character code.

How do I get Unicode in Python?

In Python, the built-in functions chr() and ord() are used to convert between Unicode code points and characters. A character can also be represented by writing a hexadecimal Unicode code point with \x , \u , or \U in a string literal.

How do I create a Unicode string in Python 3?

In Python 3, all strings are sequences of Unicode characters . You have two options to create Unicode string in Python. Either use decode() , or create a new Unicode string with UTF-8 encoding by unicode(). The unicode() method is unicode(string[, encoding, errors]) , its arguments should be 8-bit strings.


1 Answers

The Windows command prompt (cmd.exe) cannot display the Unicode characters you are using, even though Python is handling it in a correct manner internally. You need to use IDLE, Cygwin, or another program that can display Unicode correctly.

See this thread for a full explanation: http://www.nabble.com/unable-to-print-Unicode-characters-in-Python-3-td21670662.html

like image 183
Brandon Avatar answered Oct 20 '22 14:10

Brandon