I am trying to build a python script that will take in an XML document and remove all of the comment blocks from it.
I tried something along the lines of:
tree = ElementTree()
tree.parse(file)
commentElements = tree.findall('//comment()')
for element in commentElements:
element.parentNode.remove(element)
Doing this yields a weird error from python: "KeyError: '()'
I know there are ways to easily edit the file using other methods ( like sed ), but I have to do it in a python script.
comment()
is an XPath node test that is not supported by ElementTree.
You can use comment()
with lxml. This library is quite similar to ElementTree and it has full support for XPath 1.0.
Here is how you can remove comments with lxml:
from lxml import etree
XML = """<root>
<!-- COMMENT 1 -->
<x>TEXT 1</x>
<y>TEXT 2 <!-- COMMENT 2 --></y>
</root>"""
tree = etree.fromstring(XML)
comments = tree.xpath('//comment()')
for c in comments:
p = c.getparent()
p.remove(c)
print etree.tostring(tree)
Output:
<root>
<x>TEXT 1</x>
<y>TEXT 2 </y>
</root>
Use strip_tags() from lxml.etree
from lxml import etree
XML = """<root>
<!-- COMMENT 1 -->
<x>TEXT 1</x>
<y>TEXT 2 <!-- COMMENT 2 --></y>
</root>"""
tree = etree.fromstring(XML)
print etree.tostring(tree)
etree.strip_tags(tree,etree.Comment)
print etree.tostring(tree)
Output:
<root>
<!-- COMMENT 1 -->
<x>TEXT 1</x>
<y>TEXT 2 <!-- COMMENT 2 --></y>
</root>
<root>
<x>TEXT 1</x>
<y>TEXT 2 </y>
</root>
The same as
https://stackoverflow.com/a/3317008/1458574
from lxml import etree
import sys
XML = open(sys.argv[1]).read()
parser = etree.XMLParser(remove_comments=True)
tree= etree.fromstring(XML, parser = parser)
print etree.tostring(tree)
This is the solution I implemented using minidom:
def removeCommentNodes(self):
for tag in self.dom.getElementsByTagName("*"):
for n in tag.childNodes:
if n.nodeType is dom.Node.COMMENT_NODE:
n.parentNode.removeChild(n)
In practice I first retrieve all the tags in the xml, then for each tag I look for comment nodes and if found I remove them. (self.dom is a reference to the parsed xml)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With