Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faithfully Preserve Comments in Parsed XML

I'd like to preserve comments as faithfully as possible while manipulating XML.

I managed to preserve comments, but the contents are getting XML-escaped.

#!/usr/bin/env python # add_host_to_tomcat.py  import xml.etree.ElementTree as ET from CommentedTreeBuilder import CommentedTreeBuilder parser = CommentedTreeBuilder()  if __name__ == '__main__':     filename = "/opt/lucee/tomcat/conf/server.xml"      # this is the important part: use the comment-preserving parser     tree = ET.parse(filename, parser)      # get the node to add a child to     engine_node = tree.find("./Service/Engine")      # add a node: Engine.Host     host_node = ET.SubElement(         engine_node,         "Host",         name="local.mysite.com",         appBase="webapps"     )     # add a child to new node: Engine.Host.Context     ET.SubElement(         host_node,         'Context',         path="",         docBase="/path/to/doc/base"     )      tree.write('out.xml') 
#!/usr/bin/env python # CommentedTreeBuilder.py  from xml.etree import ElementTree  class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ):     def __init__ ( self, html = 0, target = None ):         ElementTree.XMLTreeBuilder.__init__( self, html, target )         self._parser.CommentHandler = self.handle_comment      def handle_comment ( self, data ):         self._target.start( ElementTree.Comment, {} )         self._target.data( data )         self._target.end( ElementTree.Comment ) 

However, comments like like:

  <!-- EXAMPLE HOST ENTRY:     <Host name="lucee.org" appBase="webapps">          <Context path="" docBase="/var/sites/getrailo.org" />      <Alias>www.lucee.org</Alias>      <Alias>my.lucee.org</Alias>     </Host>  HOST ENTRY TEMPLATE:     <Host name="[ENTER DOMAIN NAME]" appBase="webapps">          <Context path="" docBase="[ENTER SYSTEM PATH]" />      <Alias>[ENTER DOMAIN ALIAS]</Alias>     </Host>   --> 

End up as:

  <!--             EXAMPLE HOST ENTRY:     &lt;Host name="lucee.org" appBase="webapps"&gt;          &lt;Context path="" docBase="/var/sites/getrailo.org" /&gt;          &lt;Alias&gt;www.lucee.org&lt;/Alias&gt;          &lt;Alias&gt;my.lucee.org&lt;/Alias&gt;     &lt;/Host&gt;      HOST ENTRY TEMPLATE:     &lt;Host name="[ENTER DOMAIN NAME]" appBase="webapps"&gt;          &lt;Context path="" docBase="[ENTER SYSTEM PATH]" /&gt;          &lt;Alias&gt;[ENTER DOMAIN ALIAS]&lt;/Alias&gt;     &lt;/Host&gt;    --> 

I also tried self._target.data( saxutils.unescape(data) ) in CommentedTreeBuilder.py, but it didn't seem to do anything. In fact, I think the problem happens somewhere after the handle_commment() step.

By the way, this question is similar to this.

like image 772
Jamie Jackson Avatar asked Nov 06 '15 19:11

Jamie Jackson


2 Answers

Tested with Python 2.7 and 3.5, the following code should work as intended.

#!/usr/bin/env python # CommentedTreeBuilder.py from xml.etree import ElementTree  class CommentedTreeBuilder(ElementTree.TreeBuilder):     def comment(self, data):         self.start(ElementTree.Comment, {})         self.data(data)         self.end(ElementTree.Comment) 

Then, in the main code use

parser = ElementTree.XMLParser(target=CommentedTreeBuilder()) 

as the parser instead of the current one.

By the way, comments work correctly out of the box with lxml. That is, you can just do

import lxml.etree as ET tree = ET.parse(filename) 

without needing any of the above.

like image 160
Martin Valgur Avatar answered Oct 14 '22 00:10

Martin Valgur


Python 3.8 added the insert_comments argument to TreeBuilder which:

class xml.etree.ElementTree.TreeBuilder(element_factory=None, *, comment_factory=None, pi_factory=None, insert_comments=False, insert_pis=False)

When insert_comments and/or insert_pis is true, comments/pis will be inserted into the tree if they appear within the root element (but not outside of it).

Example:

parser = ElementTree.XMLParser(target=ElementTree.TreeBuilder(insert_comments=True)) 
like image 41
bernie Avatar answered Oct 14 '22 00:10

bernie