I have to create an .xml file that has pretty print and also the encoding declaration. It should look like this: like this:
<?xml version='1.0' encoding='utf-8'?>
<main>
    <sub>
        <name>Ana</name>
        <detail />
        <type>smart</type>
    </sub>
</main>
I know how to get the pretty print and the declaration, but not at the same time. To obtain the UTF-8 declaration, but no pretty print, I use the code below:
f = open(xmlPath, "w")
et.write(f, encoding='utf-8', xml_declaration=True) 
f.close()
But if I want to get the pretty print, I have to convert the xml tree into string, and I will lose the declaration. I use this code:
from xml.dom import minidom
xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent="   ")
with open(xmlPath, "w") as f:
    f.write(xmlstr.encode('utf-8'))
    f.close()
With this last code, I get the pretty print, only that the first row is:
<?xml version="1.0" ?>
I might just as well replace this with
<?xml version='1.0' encoding='utf-8'?>
but I don't find this to be the most pythonesque method.
I use the xml module and I prefer not to install extra modules because the script has to be run from various computers with standard Python. But if it's not possible, I will install other modules.
Later Edit:
In the end, with Lenz's help, I use this:
#ET=lxml.etree
xmlPath=os.path.join(output_folderXML ,"test.xml")
xmlstr= ET.tostring(root, encoding='UTF-8', xml_declaration=True, pretty_print=True)
with open(xmlPath, "w") as f:
    f.write(xmlstr)
    f.close()
I need to know if it is safe to write the result of the "tostring" method to the .xml file in the "w" mode, not "wb". As I said in one of the comments below, with "wb" I don't get the pretty print when I open the xml file in Notepad, but with "w", I do. Also, I have checked the xml file written in "w" mode and the special characters like "ü" are there. I only need an competent opinion that what I do is technically OK.
The most elegant solution is certainly using the third-party library lxml, which is being used a lot – for good reasons.
It offers both a pretty_print and an xml_declaration parameter in the tostring() method, so you get both. And the API is quite close to that of the std-lib ElementTree, which you seem to be using now. Here's an example:
>>> from lxml import etree
>>> doc = etree.parse(xmlPath)
>>> print etree.tostring(doc, encoding='UTF-8', xml_declaration=True,
                         pretty_print=True)
<?xml version='1.0' encoding='UTF-8'?>
<main>
  <sub>
    <name>Ana</name>
    <detail/>
    <type>smart</type>
  </sub>
</main>
However, I understand your desire to use the "included batteries" only.
As far as I can see, xml.etree.ElementTree has no means of changing the indentation automatically.
But the minidom work-around has a solution to getting both pretty-printing and a full declaration: use the encoding parameter of the toprettyxml() method!
>>> doc = minidom.parseString(ET.tostring(root))
>>> print doc.toprettyxml(encoding='utf8')
<?xml version="1.0" encoding="utf8"?>
<main>
    <sub>
        <name>Ana</name>
        <detail/>
        <type>smart</type>
    </sub>
</main>
(Be aware that the returned string is already encoded and that you should write it to a file opened in binary mode ("wb") and without further encoding.)
from xml.dom import minidom
xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent="   ", encoding='UTF-8')
with open(xmlPath, "w") as f:
    f.write(str(xmlstr.decode('UTF-8')))
    f.close()
Probably This will resolve your issue without using external libraries like lxml
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With