I am trying to parse xml which contains the some non ASCII cheracter,
the code looks like below
from lxml import etree
from lxml import objectify
content = u'<?xml version="1.0" encoding="utf-8"?><div>Order date : 05/08/2013 12:24:28</div>'
mail.replace('\xa0',' ')
xml = etree.fromstring(mail)
but it shows me error on the line 'content = ...' like
syntaxError: Non-ASCII character '\xc2' in file /home/projects/ztest/responce.py on line 3,
but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
in the terminal it's working but while running on the eclipse IDE it's giving me a error.
Don't know how to overcome..
Non-ASCII characters are those that are not encoded in ASCII, such as Unicode, EBCDIC, etc. ASCII is limited to 128 characters and was initially developed for the English language.
In order to use non-ASCII characters, Python requires explicit encoding and decoding of strings into Unicode. In IBM® SPSS® Modeler, Python scripts are assumed to be encoded in UTF-8, which is a standard Unicode encoding that supports non-ASCII characters.
Â, â (a-circumflex) is a letter of the Inari Sami, Skolt Sami, Romanian, and Vietnamese alphabets. This letter also appears in French, Friulian, Frisian, Portuguese, Turkish, Walloon, and Welsh languages as a variant of the letter "a". It is included in some romanization systems for Persian, Russian, and Ukrainian.
You should define source code encoding, add this to the top of your script:
# -*- coding: utf-8 -*-
The reason why it works differently in console and in the IDE is, likely, because of different default encodings set. You can check it by running:
import sys
print sys.getdefaultencoding()
Also see:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With