I've experienced this a lot, where I'll decode/encode some string of Unicode in Eclipse (PyDev), and it runs fine and how I expected, but then when I launch the same script from the command line (for example) instead, I'll get encoding errors.
Is there any simple explanation for this? Is Eclipse doing something to the Unicode/manipulating it in some different way?
EDIT:
Example:
value = u'\u2019'.decode( 'utf-8', 'ignore' )
return value
This works in Eclipse (PyDev) but not if I run it in Idle or on the command line.
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 135: ordinal not in range(128)
Just wanted to add why it worked on PyDev: it has a special sitecustomize that'll customize python through sys.setdefaultencoding to use the encoding of the PyDev console.
Note that the response from bobince is correct, if you have a unicode string, you have to use the encode() method to transform it into a proper string (you'd use decode if you had a string and wanted to transform it into a unicode).
value = u'\u2019'.decode( 'utf-8', 'ignore' )
Byte strings are DECODED into Unicode strings.
Unicode strings are ENCODED into byte strings.
So if you say someunicodestring.decode
, it tries to coerce the Unicode string to a byte string, in order to be able to decode it (back to Unicode!). Being an implicit conversion, this encoding step will plump for the default encoding, which may differ between different environments, and is likely to be the ‘safe’ value ascii
, which will certainly produce the error you mention as ASCII can't contain the character U+2019. It's almost never a good idea to rely on the default encoding.
So it doesn't make sense to try to decode
a Unicode string. I'm pretty sure you mean:
value = u'\u2019'.encode('utf-8')
(ignore
is redundant for encoding to UTF-8 as there is no character that this encoding can't represent.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With