Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

handling unicode strings in Windows

For the first time, I was trying out one of my Python scripts, which deals with unicode characters, on Windows (Vista) and found that it's not working. The script works perfectly okay on Linux and OS X but no joy on Windows. Here is the little script that I tried:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import os, sys, codecs

reload(sys)
sys.setdefaultencoding('utf-8')
print "\nDefault encoding\t: %s" % sys.getdefaultencoding()
print "sys.stdout.encoding\t: %s\n" % sys.stdout.encoding

## Unicode strings
ln1 = u"?0>9<8~7|65\"4:3}2{1+_)(*&^%$£@!/`\\][=-"
ln2 = u"mnbvc xzasdfghjkl;'poiuyàtrewq€é#¢."

refStr = u"%s%s" % (ln2,ln1)
print "refSTR: ", refStr

for x in refStr:
    print "%s => %s" % (x, ord(u"%s" % x))

When I run the script from Windows CLI, I get this error:

C:\Users\san\Scripts>python uniCode.py

Default encoding        : utf-8
sys.stdout.encoding     : cp850

refSTR;  Traceback (most recent call last):
  File "uniCode.py", line 18, in <module>
    print "refSTR; ", refStr
  File "C:\Python27\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u20ac' in position
 30: character maps to <undefined>

I came across this Python-wiki and tried a few things from there but that didn't work. Does anyone know what I'm still missing? Any help greatly appreciated. Cheers!!

like image 591
MacUsers Avatar asked Dec 12 '25 00:12

MacUsers


1 Answers

The Windows console has a Unicode API, but not utf-8. Python is trying to encode Unicode characters to your console's 8-bit code page cp850, which obviously won't work. There's supposedly a code page (chcp 65001) in the Windows console that supports utf-8, but it's severely broken. Read issue 1602 and look at sys_write_stdout.patch and unicode2.py, which use Unicode wide character functions such as WriteConsoleOutputW and WriteConsoleW. Unfortunately it's a low priority issue.

FYI, you can also use IDLE, or another GUI console (based on pythonw.exe), to run a script that outputs Unicode characters. For example:

C:\pythonXX\Lib\idlelib\idle.pyw -r script.py

But it's not a general solution if you need to write CLI console tools.

like image 125
Eryk Sun Avatar answered Dec 14 '25 14:12

Eryk Sun