Here is the code:
>>> z = u'\u2022'.decode('utf-8', 'ignore') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2022' in position 0: ordinal not in range(256)
Why is UnicodeEncodeError raised when I am using .decode?
Why is any error raised when I am using 'ignore'?
Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.
UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”
The UnicodeEncodeError normally happens when encoding a unicode string into a certain coding. Since codings map only a limited number of unicode characters to str strings, a non-presented character will cause the coding-specific encode() to fail. Encoding from unicode to str. >>>
Encoding refers to encoding a string using an encoding scheme such as UTF-8 . Decoding refers to converting an encoded string from one encoding to another encoding scheme.
When I first started messing around with python strings and unicode, It took me awhile to understand the jargon of decode and encode too, so here's my post from here that may help:
Think of decoding as what you do to go from a regular bytestring to unicode and encoding as what you do to get back from unicode. In other words:
You de-code a str
to produce a unicode
string (in Python 2)
and en-code a unicode
string to produce a str
(in Python 2)
So:
unicode_char = u'\xb0' encodedchar = unicode_char.encode('utf-8')
encodedchar
will contain your unicode character, displayed in the selected encoding (in this case, utf-8
).
The same principle applies to Python 3. You de-code a bytes
object to produce a str
object. And you en-code a str
object to produce a bytes
object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With