Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clean xml ==> Remove line if any empty tags

Tags:

python

regex

xml

I would like to clean up my xml so that not only is it valid XML, but it is formatted in a very human readable way. For example:

<Items>
    <Name>Hello</Name>
    <Cost>9.99</Cost>
    <Condition/>
</Items>

I would like to remove any lines with an empty tag, leaving:

<Items>
    <Name>Hello</Name>
    <Cost>9.99</Cost>
</Items>

I tried doing this using a regex, but haven't been having much luck in terms of leaving it in a readable format:

txt = etree.tostring(self.xml_node, pretty_print=True)
txt = re.sub(r'<[a-zA-Z]+/>\n', '', txt)

What would be the best way to accomplish the above?

like image 596
David542 Avatar asked Jun 04 '15 19:06

David542


People also ask

How do I remove an empty XML tag?

You can to remove empty XML tags from messages for optimization. For example, an XML representation of an integration object might have unused integration components. You can use the siebel_ws_param:RemoveEmptyTags parameter to remove empty tags when making Web service calls.

Can XML have empty tags?

Empty XML ElementsAn element with no content is said to be empty. The two forms produce identical results in XML software (Readers, Parsers, Browsers). Empty elements can have attributes.


1 Answers

Use an XML parser.

The idea is to find all empty nodes with //*[not(node())] XPath expression and remove them from the tree. Example, using lxml:

from lxml import etree


data = """
<Items>
    <Name>Hello</Name>
    <Cost>9.99</Cost>
    <Condition/>
</Items>
"""

root = etree.fromstring(data)
for element in root.xpath(".//*[not(node())]"):
    element.getparent().remove(element)

print etree.tostring(root, pretty_print=True)
like image 144
alecxe Avatar answered Oct 08 '22 18:10

alecxe