Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

exceptions with python unicode encode/decode functions (why doesn't errors=ignore actually ignore them??)

Tags:

python

unicode

Does anyone know why the string conversion functions throw exceptions when errors="ignore" is passed? How can I convert from regular Python string objects to unicode without errors being thrown? Thanks very much!

python -c "import codecs; codecs.open('tmp', 'wb', encoding='utf8', errors='ignore').write('кошка')"

returns
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python2.6/codecs.py", line 686, in write
return self.writer.write(data)
File "/usr/lib/python2.6/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

EDIT -- thanks for the responses, but does anyone know how to convert the literal above, not using the "u" prefix? The reason being is that you could, of course, be dealing with something that wasn't a constant :)

like image 296
gatoatigrado Avatar asked Apr 21 '10 02:04

gatoatigrado


1 Answers

The write method (in Python 2) takes a unicode object, and you're passing it a str -- so the encode call in codecs.py line 351 is first trying to build a unicode object (with the default codec, 'ascii'). Fix is easy: change the write call to

write(u'кошка')

The u prefix tells Python you're using a Unicode object, and it should be fine.

like image 198
Alex Martelli Avatar answered Oct 10 '22 02:10

Alex Martelli