I'm reading and parsing an Amazon XML file and while the XML file shows a ' , when I try to print it I get the following error:
'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)
From what I've read online thus far, the error is coming from the fact that the XML file is in UTF-8, but Python wants to handle it as an ASCII encoded character. Is there a simple way to make the error go away and have my program print the XML as it reads?
Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.
The best way to attack the problem, as with many things in Python, is to be explicit. That means that every string that your code handles needs to be clearly treated as either Unicode or a byte sequence. The most systematic way to accomplish this is to make your code into a Unicode-only clean room.
When we use such a string as a parameter to any function, there is a possibility of the occurrence of an error. Such error is known as Unicode error in Python. We get such an error because any character after the Unicode escape sequence (“ \u ”) produces an error which is a typical error on windows.
The Python "UnicodeEncodeError: 'ascii' codec can't encode character in position" occurs when we use the ascii codec to encode a string that contains non-ascii characters. To solve the error, specify the correct encoding, e.g. utf-8 . Here is an example of how the error occurs. Copied!
Likely, your problem is that you parsed it okay, and now you're trying to print the contents of the XML and you can't because theres some foreign Unicode characters. Try to encode your unicode string as ascii first:
unicodeData.encode('ascii', 'ignore')
the 'ignore' part will tell it to just skip those characters. From the python docs:
>>> # Python 2: u = unichr(40960) + u'abcd' + unichr(1972) >>> u = chr(40960) + u'abcd' + chr(1972) >>> u.encode('utf-8') '\xea\x80\x80abcd\xde\xb4' >>> u.encode('ascii') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128) >>> u.encode('ascii', 'ignore') 'abcd' >>> u.encode('ascii', 'replace') '?abcd?' >>> u.encode('ascii', 'xmlcharrefreplace') 'ꀀabcd޴'
You might want to read this article: http://www.joelonsoftware.com/articles/Unicode.html, which I found very useful as a basic tutorial on what's going on. After the read, you'll stop feeling like you're just guessing what commands to use (or at least that happened to me).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With