Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pretty print in lxml is failing when I add tags to a parsed tree

I have an xml file that I'm using etree from lxml to work with, but when I add tags to it, pretty printing doesn't seem to work.

>>> from lxml import etree >>> root = etree.parse('file.xml').getroot() >>> print etree.tostring(root, pretty_print = True)  <root>   <x>     <y>test1</y>   </x> </root> 

So far so good. But now

>>> x = root.find('x') >>> z = etree.SubElement(x, 'z') >>> etree.SubElement(z, 'z1').attrib['value'] = 'val1' >>> print etree.tostring(root, pretty_print = True)  <root>   <x>     <y>test1</y>   <z><z1 value="val1"/></z></x> </root> 

it's no longer pretty. I've also tried to do it "backwards" where I create the z1 tag, then create the z tag and append z1 to it, then append the z tag to the x tag. But I get the same result.

If I don't parse the file and just create all the tags in one go, it'll print correctly. So I think it has something to do with parsing the file.

How can I get pretty printing to work?

like image 278
Kris Harper Avatar asked Oct 26 '11 14:10

Kris Harper


People also ask

Is XML and lxml are same?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers.

Is lxml standard Python library?

There is a lot of documentation on the web and also in the Python standard library documentation, as lxml implements the well-known ElementTree API and tries to follow its documentation as closely as possible. The recipes in Fredrik Lundh's element library are generally worth taking a look at.


1 Answers

It has to do with how lxml treats whitespace -- see the lxml FAQ for details.

To fix this, change the loading part of the file to the following:

parser = etree.XMLParser(remove_blank_text=True) root = etree.parse('file.xml', parser).getroot() 

I didn't test it, but it should indent your file just fine with this change.

like image 170
jro Avatar answered Sep 29 '22 18:09

jro