Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

inserting newlines in xml file generated via xml.etree.ElementTree in python

Tags:

python

xml

I have created a xml file using xml.etree.ElementTree in python. I then use

tree.write(filename, "UTF-8")  

to write out the document to a file.

But when I open filename using a text editor, there are no newlines between the tags. Everything is one big line

How can I write out the document in a "pretty printed" format so that there are new lines (and hopefully indentations etc) between all the xml tags?

like image 416
MK. Avatar asked Jun 22 '10 17:06

MK.


People also ask

What is XML Etree ElementTree in Python?

The xml. etree. ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.

What does Etree parse do?

Parsing from strings and files. lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.


2 Answers

I found a new way to avoid new libraries and reparsing the xml. You just need to pass your root element to this function (see below explanation):

def indent(elem, level=0):     i = "\n" + level*"  "     if len(elem):         if not elem.text or not elem.text.strip():             elem.text = i + "  "         if not elem.tail or not elem.tail.strip():             elem.tail = i         for elem in elem:             indent(elem, level+1)         if not elem.tail or not elem.tail.strip():             elem.tail = i     else:         if level and (not elem.tail or not elem.tail.strip()):             elem.tail = i 

There is an attribute named "tail" on xml.etree.ElementTree.Element instances. This attribute can set an string after a node:

"<a>text</a>tail" 

I found a link from 2004 telling about an Element Library Functions that uses this "tail" to indent an element.

Example:

root = ET.fromstring("<fruits><fruit>banana</fruit><fruit>apple</fruit></fruits>""") tree = ET.ElementTree(root)  indent(root) # writing xml tree.write("example.xml", encoding="utf-8", xml_declaration=True) 

Result on "example.xml":

<?xml version='1.0' encoding='utf-8'?> <fruits>     <fruit>banana</fruit>     <fruit>apple</fruit> </fruits> 
like image 159
Erick M. Sprengel Avatar answered Sep 19 '22 13:09

Erick M. Sprengel


The easiest solution I think is switching to the lxml library. In most circumstances you can just change your import from import xml.etree.ElementTree as etree to from lxml import etree or similar.

You can then use the pretty_print option when serializing:

tree.write(filename, pretty_print=True) 

(also available on etree.tostring)

like image 29
Steven Avatar answered Sep 20 '22 13:09

Steven