Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to keep comments while parsing XML using Python / ElementTree

Currently using Python 2.4.3, and not allowed to upgrade

I want to change the values of a given attribute in one or more tags, together with XML-comments in the updated file.

I have managed to create a Python script that takes a XML-file as argument, and for each tag specified changes an attribute, as shown below

def update(file, state):
    global Etree
    try:
        from elementtree import ElementTree
        print '*** using ElementTree'
    except ImportError, e:
        print '***'
        print '*** Error: Must install either ElementTree or lxml.'
        print '***'
        raise ImportError, 'must install either ElementTree or lxml'
    #end try

    doc = Etree.parse(file)
    root = doc.getroot()

    for element in root.findall('.//StateManageable'):
        element.attrib['initialState'] = state
    #end for
    doc.write(file)
#end def

This is all fine, the attributes "initialState" are updated, except for the fact that my original XML contains a lot of XML comments as well, but they are long gone, which is bad.

Suspect that parse only retrieves the XML-structure, but I thought XML-comments where a part of the structure. I also realize that the "human-readable" formatting of my original document is long gone, but that I have realized is expected behavior, need to format afterwards using xmllint --format or XSL.

like image 371
rhellem Avatar asked Dec 17 '10 21:12

rhellem


1 Answers

I know this is old now, but I stumbled across this answer above about how to retain comments. Frederik's published instructions about how to put comments into the tree still works with current versions of ElementTree, but does more than it needs to for my use, at least. It wraps the XML in a element, which is undesirable for me. I also don't need processing instructions preserved, but only comments. So, I trimmed down the class he provided on the site to this:

import xml.etree.ElementTree as ET

class PCParser(ET.XMLTreeBuilder):

   def __init__(self):
       ET.XMLTreeBuilder.__init__(self)
       # assumes ElementTree 1.2.X
       self._parser.CommentHandler = self.handle_comment

   def handle_comment(self, data):
       self._target.start(ET.Comment, {})
       self._target.data(data)
       self._target.end(ET.Comment)

To use this, create an instance of this object as a 'parser' and then pass as a parameter to ElementTree.parse() like this:

parser = PCParser()
self.tree = ET.parse(self.templateOut, parser=parser)

I take no credit whatsoever for the code, or for the undocumented use of ElementTree, but it works for me in preserving only comments without affecting the original document structure. And note that any future change to ElementTree (seems unlikely at this point after all these years, though) will break this.

like image 199
Jon Thomason Avatar answered Oct 04 '22 00:10

Jon Thomason