Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4'

I permanently get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 27: ordinal not in range(128)

I already tried

  1. x.encode("ascii", "ignore")
  2. x.encode("utf-8")
  3. x.decode("utf-8")

However, nothing works.

like image 376
toom Avatar asked Oct 27 '14 17:10

toom


People also ask

How do I fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

What is UnicodeEncodeError?

The UnicodeEncodeError normally happens when encoding a unicode string into a certain coding. Since codings map only a limited number of unicode characters to str strings, a non-presented character will cause the coding-specific encode() to fail. Encoding from unicode to str. >>>

Can not encode Python?

The issue is that when you call str(), python uses the default character encoding to try and encode the bytes you gave it, which in your case are sometimes representations of unicode characters. To fix the problem, you have to tell python how to deal with the string you give it by using . encode('whatever_unicode').


1 Answers

You have to discover in which encoding is this character at the source.

I guess this is ISO-8859-1 (european languages), in which case it's "ä", but you should check. It could also be cyrillic or greek.

See http://en.wikipedia.org/wiki/ISO/IEC_8859-1 for a complete list of characters in this encoding.

Using this information, you can ask Python to convert it :

In Python 2.7

>>> s = '\xe4'
>>> t = s.decode('iso-8859-1')
>>> print t
ä
>>> for c in t:
...   print ord(c)
...
228
>>> u = t.encode('utf-8')
>>> print u
ä
>>> for c in bytes(u):
...   print ord(c)
...
195
164

String t is internally encoded in ISO-8859-1 in Python. String u is internally encoded in UTF-8, and that character takes 2 bytes in UTF-8. Notice also that the print instruction "knows" how to display these different encodings.

like image 123
Mickaël Bucas Avatar answered Sep 28 '22 13:09

Mickaël Bucas