Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to indent attributes in when prettyprinting xml in python?

Suppose I have XML like this:

 <graph label="Test" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cy="http://www.cytoscape.org" xmlns="http://www.cs.rpi.edu/XGMML"  directed="1">
    <foo>...</foo>
 </graph>

The first element name with all its attributes all appear on one line.

I have seen how to pretty print the element tree, using lxml, with code like this:

from lxml import etree
 ...
def prettyPrintXml(filePath):
    assert filePath is not None
    parser = etree.XMLParser(resolve_entities=False, remove_blank_text=True, 
                             strip_cdata=False)
    document = etree.parse(filePath, parser)
    print(etree.tostring(document, pretty_print=True, encoding='utf-8'))

... but using that, every element appears on one line.

Is there a magic incantation to tell the pretty printer to insert newlines between the element attributes so that, for example, the line length does not exceed 80 characaters?

I would like the result to look something like this:

<graph label="Test"
       xmlns:dc="http://purl.org/dc/elements/1.1/"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:cy="http://www.cytoscape.org"
       xmlns="http://www.cs.rpi.edu/XGMML"  directed="1">
  <foo>...</foo>
</graph>

ps: I don't want to resort to subprocess and xmllint

like image 870
Cheeso Avatar asked Oct 13 '12 17:10

Cheeso


People also ask

How do I read an XML string in Python?

Read XML File Using MiniDOM It is Python module, used to read XML file. It provides parse() function to read XML file. We must import Minidom first before using its function in the application. The syntax of this function is given below.

How do I import LXML into Python?

Type “ pip install lxml ” (without quotes) in the command line and hit Enter again. This installs lxml for your default Python installation. The previous command may not work if you have both Python versions 2 and 3 on your computer. In this case, try "pip3 install lxml" or “ python -m pip install lxml “.


1 Answers

lxml has a pretty print function built in: here's a tutorial which describes several ways to print xml. It has some limitations (limitations in the xml specs, according to lxml), though.

This stackoverflow question has several answers with more or less hacky solutions to pretty print xml, and I think you could model at least the regexp based answer to suit your needs.

Fredrik Lundh (of ElementTree fame) has a very low-level description for printing xml, which you could also customize to newline and indent attributes.

like image 121
Steen Avatar answered Oct 22 '22 22:10

Steen