Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unicode().decode('utf-8', 'ignore') raising UnicodeEncodeError

Tags:

python

unicode

Here is the code:

>>> z = u'\u2022'.decode('utf-8', 'ignore') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode     return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2022' in position 0: ordinal not in range(256) 

Why is UnicodeEncodeError raised when I am using .decode?

Why is any error raised when I am using 'ignore'?

like image 641
Facundo Casco Avatar asked Feb 23 '11 20:02

Facundo Casco


People also ask

How do I fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

What does UTF-8 mean in Unicode?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”

What is UnicodeEncodeError?

The UnicodeEncodeError normally happens when encoding a unicode string into a certain coding. Since codings map only a limited number of unicode characters to str strings, a non-presented character will cause the coding-specific encode() to fail. Encoding from unicode to str. >>>

What is decode (' UTF-8 ') in Python?

Encoding refers to encoding a string using an encoding scheme such as UTF-8 . Decoding refers to converting an encoded string from one encoding to another encoding scheme.


1 Answers

When I first started messing around with python strings and unicode, It took me awhile to understand the jargon of decode and encode too, so here's my post from here that may help:


Think of decoding as what you do to go from a regular bytestring to unicode and encoding as what you do to get back from unicode. In other words:

You de-code a str to produce a unicode string (in Python 2)

and en-code a unicode string to produce a str (in Python 2)

So:

unicode_char = u'\xb0'  encodedchar = unicode_char.encode('utf-8') 

encodedchar will contain your unicode character, displayed in the selected encoding (in this case, utf-8).

The same principle applies to Python 3. You de-code a bytes object to produce a str object. And you en-code a str object to produce a bytes object.

like image 131
Aphex Avatar answered Sep 25 '22 03:09

Aphex