Can't decode utf-8 string in python on os x terminal.app

Question

I have terminal.app set to accept utf-8 and in bash I can type unicode characters, copy and paste them, but if I start the python shell I can't and if I try to decode unicode I get errors:

>>> wtf = u'\xe4\xf6\xfc'.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>> wtf = u'\xe4\xf6\xfc'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

Anyone know what I'm doing wrong?

sth · Accepted Answer

I think there is encode/decode confusion all over the place. You start with an unicode object:

u'\xe4\xf6\xfc'

This is an unicode object, the three characters are the unicode codepoints for "äöü". If you want to turn them into Utf-8, you have to encode them:

>>> u'\xe4\xf6\xfc'.encode('utf-8')
'\xc3\xa4\xc3\xb6\xc3\xbc'

The resulting six characters are the Utf-8 representation of "äöü".

If you call decode(...), you try to interpret the characters as some encoding that still needs to be converted to unicode. Since it already is Unicode, this doesn't work. Your first call tries a Ascii to Unicode conversion, the second call a Utf-8 to Unicode conversion. Since u'\xe4\xf6\xfc' is neither valid Ascii nor valid Utf-8 these conversion attempts fail.

Further confusion might come from the fact that '\xe4\xf6\xfc' is also the Latin1/ISO-8859-1 encoding of "äöü". If you write a normal python string (without the leading "u" that marks it as unicode), you can convert it to an unicode object with decode('latin1'):

>>> '\xe4\xf6\xfc'.decode('latin1')
u'\xe4\xf6\xfc'

Can't decode utf-8 string in python on os x terminal.app

Tags:

python

terminal

macos

unicode

Bjorn

1 Answers

sth

Recent Activity

Donate For Us

Can't decode utf-8 string in python on os x terminal.app

Tags:

python

terminal

macos

unicode

Bjorn

1 Answers

sth

Related questions

Recent Activity

Donate For Us