I permanently get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 27: ordinal not in range(128)
I already tried
x.encode("ascii", "ignore")
x.encode("utf-8")
x.decode("utf-8")
However, nothing works.
Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.
The UnicodeEncodeError normally happens when encoding a unicode string into a certain coding. Since codings map only a limited number of unicode characters to str strings, a non-presented character will cause the coding-specific encode() to fail. Encoding from unicode to str. >>>
The issue is that when you call str(), python uses the default character encoding to try and encode the bytes you gave it, which in your case are sometimes representations of unicode characters. To fix the problem, you have to tell python how to deal with the string you give it by using . encode('whatever_unicode').
You have to discover in which encoding is this character at the source.
I guess this is ISO-8859-1 (european languages), in which case it's "ä", but you should check. It could also be cyrillic or greek.
See http://en.wikipedia.org/wiki/ISO/IEC_8859-1 for a complete list of characters in this encoding.
Using this information, you can ask Python to convert it :
In Python 2.7
>>> s = '\xe4'
>>> t = s.decode('iso-8859-1')
>>> print t
ä
>>> for c in t:
... print ord(c)
...
228
>>> u = t.encode('utf-8')
>>> print u
ä
>>> for c in bytes(u):
... print ord(c)
...
195
164
String t
is internally encoded in ISO-8859-1 in Python. String u
is internally encoded in UTF-8, and that character takes 2 bytes in UTF-8. Notice also that the print
instruction "knows" how to display these different encodings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With