Can someone explain to me this odd thing:
When in python shell I type the following Cyrillic string:
>>> print 'абвгд'
абвгд
but when I type:
>>> print u'абвгд'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)
Since the first tring came out correctly, I reckon my OS X terminal can represent unicode, but it turns out it can't in the second case. Why ?
>>> print 'абвгд'
абвгд
When you type in some characters, your terminal decides how these characters are represented to the application. Your terminal might give the characters to the application encoded as utf-8, ISO-8859-5 or even something that only your terminal understands. Python gets these characters as some sequence of bytes. Then python prints out these bytes as they are, and your terminal interprets them in some way to display characters. Since your terminal usually interprets the bytes the same way as it encoded them before, everything is displayed like you typed it in.
>>> u'абвгд'
Here you type in some characters that arrive at the python interpreter as a sequence of bytes, maybe encoded in some way by the terminal. With the u
prefix python tries to convert this data to unicode. To do this correctly python has to known what encoding your terminal uses. In your case it looks like Python guesses your terminals encoding would be ASCII, but the received data doesn't match that, so you get an encoding error.
The straight forward way to create unicode strings in an interactive session would therefore be something like this this:
>>> us = 'абвгд'.decode('my-terminal-encoding')
In files you can also specify the encoding of the file with a special mode line:
# -*- encoding: ISO-8859-5 -*-
us = u'абвгд'
For other ways to set the default input encoding you can look at sys.setdefaultencoding(...)
or sys.stdin.encoding
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With