I'd like to preserve comments as faithfully as possible while manipulating XML.
I managed to preserve comments, but the contents are getting XML-escaped.
#!/usr/bin/env python # add_host_to_tomcat.py import xml.etree.ElementTree as ET from CommentedTreeBuilder import CommentedTreeBuilder parser = CommentedTreeBuilder() if __name__ == '__main__': filename = "/opt/lucee/tomcat/conf/server.xml" # this is the important part: use the comment-preserving parser tree = ET.parse(filename, parser) # get the node to add a child to engine_node = tree.find("./Service/Engine") # add a node: Engine.Host host_node = ET.SubElement( engine_node, "Host", name="local.mysite.com", appBase="webapps" ) # add a child to new node: Engine.Host.Context ET.SubElement( host_node, 'Context', path="", docBase="/path/to/doc/base" ) tree.write('out.xml')
#!/usr/bin/env python # CommentedTreeBuilder.py from xml.etree import ElementTree class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ): def __init__ ( self, html = 0, target = None ): ElementTree.XMLTreeBuilder.__init__( self, html, target ) self._parser.CommentHandler = self.handle_comment def handle_comment ( self, data ): self._target.start( ElementTree.Comment, {} ) self._target.data( data ) self._target.end( ElementTree.Comment )
However, comments like like:
<!-- EXAMPLE HOST ENTRY: <Host name="lucee.org" appBase="webapps"> <Context path="" docBase="/var/sites/getrailo.org" /> <Alias>www.lucee.org</Alias> <Alias>my.lucee.org</Alias> </Host> HOST ENTRY TEMPLATE: <Host name="[ENTER DOMAIN NAME]" appBase="webapps"> <Context path="" docBase="[ENTER SYSTEM PATH]" /> <Alias>[ENTER DOMAIN ALIAS]</Alias> </Host> -->
End up as:
<!-- EXAMPLE HOST ENTRY: <Host name="lucee.org" appBase="webapps"> <Context path="" docBase="/var/sites/getrailo.org" /> <Alias>www.lucee.org</Alias> <Alias>my.lucee.org</Alias> </Host> HOST ENTRY TEMPLATE: <Host name="[ENTER DOMAIN NAME]" appBase="webapps"> <Context path="" docBase="[ENTER SYSTEM PATH]" /> <Alias>[ENTER DOMAIN ALIAS]</Alias> </Host> -->
I also tried self._target.data( saxutils.unescape(data) )
in CommentedTreeBuilder.py
, but it didn't seem to do anything. In fact, I think the problem happens somewhere after the handle_commment()
step.
By the way, this question is similar to this.
Tested with Python 2.7 and 3.5, the following code should work as intended.
#!/usr/bin/env python # CommentedTreeBuilder.py from xml.etree import ElementTree class CommentedTreeBuilder(ElementTree.TreeBuilder): def comment(self, data): self.start(ElementTree.Comment, {}) self.data(data) self.end(ElementTree.Comment)
Then, in the main code use
parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
as the parser instead of the current one.
By the way, comments work correctly out of the box with lxml
. That is, you can just do
import lxml.etree as ET tree = ET.parse(filename)
without needing any of the above.
Python 3.8 added the insert_comments
argument to TreeBuilder
which:
class xml.etree.ElementTree.TreeBuilder(element_factory=None, *, comment_factory=None, pi_factory=None, insert_comments=False, insert_pis=False)
When insert_comments and/or insert_pis is true, comments/pis will be inserted into the tree if they appear within the root element (but not outside of it).
Example:
parser = ElementTree.XMLParser(target=ElementTree.TreeBuilder(insert_comments=True))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With