Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove empty XML tags, containing whitespace only, in XML?

I need to remove cases like this:

<text> </text>

I have codes that works when there is no whitespace, but what about if there is whitespace?

Code:

doc = etree.XML("""<root><a>1</a><b><c></c></b><d></d></root>""")

def remove_empty_elements(doc):
  for element in doc.xpath('//*[not(node())]'):
    element.getparent().remove(element)

I also need to do it with lxml and not BeautifulSoup.

like image 599
Anna Avatar asked Dec 02 '25 10:12

Anna


1 Answers

This XPath,

//*[not(*)][not(normalize-space())]

will select all leaf elements with only whitespace content.

For your example specifically,

<root><a>1</a><b><c></c></b><d></d></root>

these elements will be selected: c and d.

For an example that also includes whitespace-only elements,

<root>
  <a>1</a>
  <b>
    <c></c>
  </b>
  <d/>
  <e>     </e>
  <f>
  </f>
</root>

these elements will be selected: c, d, e, and f.

like image 196
kjhughes Avatar answered Dec 03 '25 23:12

kjhughes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!