Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I prevent lxml from auto-closing empty elements when serializing to string?

Tags:

python

lxml

I am parsing a huge xml file which contains many empty elements such as

<MemoryEnv></MemoryEnv>

When serializing with

etree.tostring(root_element, pretty_print=True)

the output element is collapsed to

<MemoryEnv/>

Is there any way to prevent this? the etree.tostring() does not provide such a facility.

Is there a way interfere with lxml's tostring() serializer?

Btw, the html module does not work. It's not designed for XML, and it does not create empty elements in their original form.

The problem is, that although collapsed and uncollapsed forms of an empty element are equivalent, the program that parses this file won't work with collapsed empty elements.

like image 509
Petros Makris Avatar asked Dec 05 '15 21:12

Petros Makris


2 Answers

Using XML method (c14n) for printing and it works with lxml, it does not collapse empty elements.

>>> from lxml import etree
>>> s = "<MemoryEnv></MemoryEnv>"
>>> root_element = etree.XML(s)
>>> etree.tostring(root_element, method="c14n")
b'<MemoryEnv></MemoryEnv>'
like image 41
Petros Makris Avatar answered Sep 29 '22 18:09

Petros Makris


Here is a way to do it. Ensure that the text value for all empty elements is not None.

Example:

from lxml import etree

XML = """
<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>"""

doc = etree.fromstring(XML)

for elem in doc.iter():
    if elem.text == None:
        elem.text = ''

print etree.tostring(doc)

Output:

<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>

An alternative is to use the write_c14n() method to write canonical XML (which does not use the special empty-element syntax) to a file.

from lxml import etree

XML = """
<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>"""

doc = etree.fromstring(XML)

doc.getroottree().write_c14n("out.xml")
like image 139
mzjn Avatar answered Sep 29 '22 18:09

mzjn