I'm transforming an xml document with xslt. While doing it with python3 I had this following error. But I don't have any errors with python2
-> % python3 cstm/artefact.py Traceback (most recent call last): File "cstm/artefact.py", line 98, in <module> simplify_this_dataset('fisheries-service-des-peches.xml') File "cstm/artefact.py", line 85, in simplify_this_dataset xslt_root = etree.XML(xslt_content) File "lxml.etree.pyx", line 3012, in lxml.etree.XML (src/lxml/lxml.etree.c:67861) File "parser.pxi", line 1780, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:102420) ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. #!/usr/bin/env python3 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai # -*- coding: utf-8 -*- from lxml import etree def simplify_this_dataset(dataset): """Create A simplify version of an xml file it will remove all the attributes and assign them as Elements instead """ module_path = os.path.dirname(os.path.abspath(__file__)) data = open(module_path+'/data/ex-fire.xslt') xslt_content = data.read() xslt_root = etree.XML(xslt_content) dom = etree.parse(module_path+'/../CanSTM_dataset/'+dataset) transform = etree.XSLT(xslt_root) result = transform(dom) f = open(module_path+ '/../CanSTM_dataset/otra.xml', 'w') f.write(str(result)) f.close()
lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.
data = open(module_path+'/data/ex-fire.xslt') xslt_content = data.read()
This implicitly decodes the bytes in the file to Unicode text, using the default encoding. (This might give wrong results, if the XML file isn't in that encoding.)
xslt_root = etree.XML(xslt_content)
XML has its own handling and signalling for encodings, the <?xml encoding="..."?>
prolog. If you pass a Unicode string starting with <?xml encoding="..."?>
to a parser, the parser would like to reintrepret the rest of the byte string using that encoding... but can't, because you've already decoded the byte input to a Unicode string.
Instead, you should either pass the undecoded byte string to the parser:
data = open(module_path+'/data/ex-fire.xslt', 'rb') xslt_content = data.read() xslt_root = etree.XML(xslt_content)
or, better, just have the parser read straight from the file:
xslt_root = etree.parse(module_path+'/data/ex-fire.xslt')
You can also decode the UTF-8 string and encode it with ascii before passing it to etree.XML
xslt_content = data.read() xslt_content = xslt_content.decode('utf-8').encode('ascii') xslt_root = etree.XML(xslt_content)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With