Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I get IOErrors when writing Unicode to the CMD? (With codepage 65001)

I'm on the CMD in Windows 8 and I've set the codepage to 65001 (chcp 65001). I'm using Python 2.7.2 (ActivePython 2.7.2.5) and I've set the PYTHONSTARTUP environment variable to "bootstrap.py".

bootstrap.py:

import codecs
codecs.register(
    lambda name: name == 'cp65001' and codecs.lookup('UTF-8') or None
)

This lets me print ASCII:

>>> print 'hello'
hello
>>> print u'hello'
hello

But the errors I get when I try to print a Unicode string with non-ASCII characters makes no sense to me. Here I try to print a few strings containing Nordic symbols (I added the extra line break between the prints for readability):

>>> print u'æøå'
��øåTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory

>>> print u'åndalsnes'
��ndalsnes

>>> print u'åndalsnesæ'
��ndalsnesæTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 22] Invalid argument

>>> print u'Øst'
��st

>>> print u'uØst'
uØstTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 22] Invalid argument

>>> print u'ØstÆØÅæøå'
��stÆØÅæøåTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 22] Invalid argument

>>> print u'_ØstÆØÅæøå'
_ØstÆØÅæøåTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 22] Invalid argument

As you see it doesn't always raise an error (and doesn't even raise the same error every time), and the Nordic symbols is only displayed correctly occasionally.

Can somebody explain this behavior, or at least help me figure out how to print Unicode to the CMD correctly?

like image 617
Hubro Avatar asked Nov 19 '12 11:11

Hubro


1 Answers

Try This :

# -*- coding: utf-8 -*-
    from __future__ import unicode_literals
    print u'æøå'

Making use of from __future__ import unicode_literals would be useful in an interactive python session.

It is certainly possible to write Unicode to the console successfully using WriteConsoleW. This works regardless of the console code page, including 65001. The code here does so (it's for Python 2.x, but you'd be calling WriteConsoleW from C anyway).

WriteConsoleW has one bug that I know of, which is that it fails when writing more than 26608 characters at once. That's easy to work around by limiting the amount of data passed in a single call.

Fonts are not Python's problem, but encoding is. It doesn't make sense to fail to output the right characters just because some users might not have selected fonts that can display those characters. This bug should be reopened.

(For completeness, it is possible to display Unicode on the console using fonts other than Lucida Console and Consolas, but it requires a registry hack.) I hope it helps.

like image 189
Soheil Avatar answered Oct 03 '22 22:10

Soheil