Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python xml minidom. generate <text>Some text</text> element

I have the following code.

from xml.dom.minidom import Document

doc = Document()

root = doc.createElement('root')
doc.appendChild(root)
main = doc.createElement('Text')
root.appendChild(main)

text = doc.createTextNode('Some text here')
main.appendChild(text)

print doc.toprettyxml(indent='\t')

The result is:

<?xml version="1.0" ?>
<root>
    <Text>
        Some text here
    </Text>
</root>

This is all fine and dandy, but what if I want the output to look like this?

<?xml version="1.0" ?>
<root>
    <Text>Some text here</Text>
</root>

Can this easily be done?

Orjanp...

like image 983
Orjanp Avatar asked Feb 03 '09 14:02

Orjanp


2 Answers

Can this easily be done?

It depends what exact rule you want, but generally no, you get little control over pretty-printing. If you want a specific format you'll usually have to write your own walker.

The DOM Level 3 LS parameter format-pretty-print in pxdom comes pretty close to your example. Its rule is that if an element contains only a single TextNode, no extra whitespace will be added. However it (currently) uses two spaces for an indent rather than four.

>>> doc= pxdom.parseString('<a><b>c</b></a>')
>>> doc.domConfig.setParameter('format-pretty-print', True)
>>> print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
  <b>c</b>
</a>

(Adjust encoding and output format for whatever type of serialisation you're doing.)

If that's the rule you want, and you can get away with it, you might also be able to monkey-patch minidom's Element.writexml, eg.:

>>> from xml.dom import minidom
>>> def newwritexml(self, writer, indent= '', addindent= '', newl= ''):
...     if len(self.childNodes)==1 and self.firstChild.nodeType==3:
...         writer.write(indent)
...         self.oldwritexml(writer) # cancel extra whitespace
...         writer.write(newl)
...     else:
...         self.oldwritexml(writer, indent, addindent, newl)
... 
>>> minidom.Element.oldwritexml= minidom.Element.writexml
>>> minidom.Element.writexml= newwritexml

All the usual caveats about the badness of monkey-patching apply.

like image 101
bobince Avatar answered Sep 28 '22 07:09

bobince


I was looking for exactly the same thing, and I came across this post. (the indenting provided in xml.dom.minidom broke another tool that I was using to manipulate the XML, and I needed it to be indented) I tried the accepted solution with a slightly more complex example and this was the result:

In [1]: import pxdom

In [2]: xml = '<a><b>fda</b><c><b>fdsa</b></c></a>'

In [3]: doc = pxdom.parseString(xml)

In [4]: doc.domConfig.setParameter('format-pretty-print', True)

In [5]: print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
  <b>fda</b><c>
    <b>fdsa</b>
  </c>
</a>

The pretty printed XML isn't formatted correctly, and I'm not too happy about monkey patching (i.e. I barely know what it means, and understand it's bad), so I looked for another solution.

I'm writing the output to file, so I can use the xmlindent program for Ubuntu ($sudo aptitude install xmlindent). So I just write the unformatted to the file, then call the xmlindent from within the python program:

from subprocess import Popen, PIPE
Popen(["xmlindent", "-i", "2", "-w", "-f", "-nbe", file_name], 
      stderr=PIPE, 
      stdout=PIPE).communicate()

The -w switch causes the file to be overwritten, but annoyingly leaves a named e.g. "myfile.xml~" which you'll probably want to remove. The other switches are there to get the formatting right (for me).

P.S. xmlindent is a stream formatter, i.e. you can use it as follows:

cat myfile.xml | xmlindent > myfile_indented.xml

So you might be able to use it in a python script without writing to a file if you needed to.

like image 24
markmuetz Avatar answered Sep 28 '22 06:09

markmuetz