Suppose I have XML like this:
<graph label="Test" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cy="http://www.cytoscape.org" xmlns="http://www.cs.rpi.edu/XGMML" directed="1">
<foo>...</foo>
</graph>
The first element name with all its attributes all appear on one line.
I have seen how to pretty print the element tree, using lxml, with code like this:
from lxml import etree
...
def prettyPrintXml(filePath):
assert filePath is not None
parser = etree.XMLParser(resolve_entities=False, remove_blank_text=True,
strip_cdata=False)
document = etree.parse(filePath, parser)
print(etree.tostring(document, pretty_print=True, encoding='utf-8'))
... but using that, every element appears on one line.
Is there a magic incantation to tell the pretty printer to insert newlines between the element attributes so that, for example, the line length does not exceed 80 characaters?
I would like the result to look something like this:
<graph label="Test"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cy="http://www.cytoscape.org"
xmlns="http://www.cs.rpi.edu/XGMML" directed="1">
<foo>...</foo>
</graph>
ps: I don't want to resort to subprocess
and xmllint
Read XML File Using MiniDOM It is Python module, used to read XML file. It provides parse() function to read XML file. We must import Minidom first before using its function in the application. The syntax of this function is given below.
Type “ pip install lxml ” (without quotes) in the command line and hit Enter again. This installs lxml for your default Python installation. The previous command may not work if you have both Python versions 2 and 3 on your computer. In this case, try "pip3 install lxml" or “ python -m pip install lxml “.
lxml
has a pretty print function built in: here's a tutorial which describes several ways to print xml. It has some limitations (limitations in the xml specs, according to lxml), though.
This stackoverflow question has several answers with more or less hacky solutions to pretty print xml, and I think you could model at least the regexp based answer to suit your needs.
Fredrik Lundh (of ElementTree fame) has a very low-level description for printing xml, which you could also customize to newline and indent attributes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With