When I try to parse XML with lxml like this:
tree = etree.parse('xml.xml')
I get the following error:
lxml.etree.XMLSyntaxError: Unsupported encoding windows-1251
How can I read data from an XML with this encoding?
Thank you
I think you use a Python 2.x version.
If so, I believe that you must use the open() function of codecs module, and to do:
import codecs
with codecs.open(filename,'rb','cp1251') as f:
content = f.read()
tree = etree.parse(content)
I think that the obtained content has been decoded from cp1251 to Unicode; I am not sure, I am not skilled in Unicode manipulations.
If so, I suppose that, after the reading, etree must be able to parse a string in Unicode to continue. But I know etree a little too.
Note that even if mode was 'r', codecs.open() always opens a file in binary mode.
Hope that will help
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With