I would like to clean up my xml so that not only is it valid XML, but it is formatted in a very human readable way. For example:
<Items>
<Name>Hello</Name>
<Cost>9.99</Cost>
<Condition/>
</Items>
I would like to remove any lines with an empty tag, leaving:
<Items>
<Name>Hello</Name>
<Cost>9.99</Cost>
</Items>
I tried doing this using a regex, but haven't been having much luck in terms of leaving it in a readable format:
txt = etree.tostring(self.xml_node, pretty_print=True)
txt = re.sub(r'<[a-zA-Z]+/>\n', '', txt)
What would be the best way to accomplish the above?
You can to remove empty XML tags from messages for optimization. For example, an XML representation of an integration object might have unused integration components. You can use the siebel_ws_param:RemoveEmptyTags parameter to remove empty tags when making Web service calls.
Empty XML ElementsAn element with no content is said to be empty. The two forms produce identical results in XML software (Readers, Parsers, Browsers). Empty elements can have attributes.
Use an XML parser.
The idea is to find all empty nodes with //*[not(node())]
XPath expression and remove them from the tree. Example, using lxml
:
from lxml import etree
data = """
<Items>
<Name>Hello</Name>
<Cost>9.99</Cost>
<Condition/>
</Items>
"""
root = etree.fromstring(data)
for element in root.xpath(".//*[not(node())]"):
element.getparent().remove(element)
print etree.tostring(root, pretty_print=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With