Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Unicode Encode Error

I'm reading and parsing an Amazon XML file and while the XML file shows a ' , when I try to print it I get the following error:

'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)  

From what I've read online thus far, the error is coming from the fact that the XML file is in UTF-8, but Python wants to handle it as an ASCII encoded character. Is there a simple way to make the error go away and have my program print the XML as it reads?

like image 574
Alex B Avatar asked Jul 11 '10 19:07

Alex B


People also ask

How do I fix Unicode encode errors in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

How do I fix encoding in Python?

The best way to attack the problem, as with many things in Python, is to be explicit. That means that every string that your code handles needs to be clearly treated as either Unicode or a byte sequence. The most systematic way to accomplish this is to make your code into a Unicode-only clean room.

What is Unicode error in Python?

When we use such a string as a parameter to any function, there is a possibility of the occurrence of an error. Such error is known as Unicode error in Python. We get such an error because any character after the Unicode escape sequence (“ \u ”) produces an error which is a typical error on windows.

Can not encode Python?

The Python "UnicodeEncodeError: 'ascii' codec can't encode character in position" occurs when we use the ascii codec to encode a string that contains non-ascii characters. To solve the error, specify the correct encoding, e.g. utf-8 . Here is an example of how the error occurs. Copied!


1 Answers

Likely, your problem is that you parsed it okay, and now you're trying to print the contents of the XML and you can't because theres some foreign Unicode characters. Try to encode your unicode string as ascii first:

unicodeData.encode('ascii', 'ignore') 

the 'ignore' part will tell it to just skip those characters. From the python docs:

>>> # Python 2: u = unichr(40960) + u'abcd' + unichr(1972) >>> u = chr(40960) + u'abcd' + chr(1972) >>> u.encode('utf-8') '\xea\x80\x80abcd\xde\xb4' >>> u.encode('ascii') Traceback (most recent call last):   File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128) >>> u.encode('ascii', 'ignore') 'abcd' >>> u.encode('ascii', 'replace') '?abcd?' >>> u.encode('ascii', 'xmlcharrefreplace') '&#40960;abcd&#1972;' 

You might want to read this article: http://www.joelonsoftware.com/articles/Unicode.html, which I found very useful as a basic tutorial on what's going on. After the read, you'll stop feeling like you're just guessing what commands to use (or at least that happened to me).

like image 105
Scott Stafford Avatar answered Sep 20 '22 07:09

Scott Stafford