How to remove empty XML tags, containing whitespace only, in XML?

Question

I need to remove cases like this:

<text> </text>

I have codes that works when there is no whitespace, but what about if there is whitespace?

Code:

doc = etree.XML("""<root><a>1</a><b><c></c></b><d></d></root>""")

def remove_empty_elements(doc):
  for element in doc.xpath('//*[not(node())]'):
    element.getparent().remove(element)

I also need to do it with lxml and not BeautifulSoup.

kjhughes · Accepted Answer

This XPath,

//*[not(*)][not(normalize-space())]

will select all leaf elements with only whitespace content.

For your example specifically,

<root><a>1</a><b><c></c></b><d></d></root>

these elements will be selected: c and d.

For an example that also includes whitespace-only elements,

<root>
  <a>1</a>
  <b>
    <c></c>
  </b>
  <d/>
  <e>     </e>
  <f>
  </f>
</root>

these elements will be selected: c, d, e, and f.

How to remove empty XML tags, containing whitespace only, in XML?

Tags:

python

python-3.x

xml

lxml

elementtree

Anna

1 Answers

kjhughes

Recent Activity

Donate For Us

How to remove empty XML tags, containing whitespace only, in XML?

Tags:

python

python-3.x

xml

lxml

elementtree

Anna

1 Answers

kjhughes

Related questions

Recent Activity

Donate For Us