I have an XML writing script that outputs XML for a specific 3rd party tool.
I've used the original XML as a template to make sure that I'm building all the correct elements, but the final XML does not appear like the original.
I write the attributes in the same order, but lxml is writing them in its own order.
I'm not sure, but I suspect that the 3rd part tool expects attributes to appear in a specific order, and I'd like to resolve this issue so I can see if its the attrib order that making it fail, or something else.
Source element:
<FileFormat ID="1" Name="Development Signature" PUID="dev/1" Version="1.0" MIMEType="text/x-test-signature">
My source script:
sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", ID = str(db.ID), Name = db.name, PUID="fileSig/{}".format(str(db.ID)), Version = "", MIMEType = "")
My resultant XML:
<FileFormat MIMEType="" PUID="fileSig/19" Version="" Name="Printer Info File" ID="19">
Is there a way of constraining the order they are written?
It looks like lxml serializes attributes in the order you set them:
>>> from lxml import etree as ET
>>> x = ET.Element("x")
>>> x.set('a', '1')
>>> x.set('b', '2')
>>> ET.tostring(x)
'<x a="1" b="2"/>'
>>> y= ET.Element("y")
>>> y.set('b', '2')
>>> y.set('a', '1')
>>> ET.tostring(y)
'<y b="2" a="1"/>'
Note that when you pass attributes using the ET.SubElement() constructor, Python constructs a dictionary of keyword arguments and passes that dictionary to lxml. This loses any ordering you had in the source file, since Python's dictionaries are unordered (or, rather, their order is determined by string hash values, which may differ from platform to platform or, in fact, from execution to execution).
As of lxml 3.3.3 (perhaps also in earlier versions) you can pass an OrderedDict of attributes to the lxml.etree.(Sub)Element
constructor and the order will be preserved when using lxml.etree.tostring(root)
:
sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", OrderedDict([("ID",str(db.ID)), ("Name",db.name), ("PUID","fileSig/{}".format(str(db.ID))), ("Version",""), ("MIMEType","")]))
Note that the ElementTree API (xml.etree.ElementTree
) does not preserve attribute order even if you provide an OrderedDict
to the xml.etree.ElementTree.(Sub)Element
constructor!
UPDATE: Also note that using the **extra
parameter of the lxml.etree.(Sub)Element
constructor for specifying attributes does not preserve attribute order:
>>> from lxml.etree import Element, tostring
>>> from collections import OrderedDict
>>> root = Element("root", OrderedDict([("b","1"),("a","2")])) # attrib parameter
>>> tostring(root)
b'<root b="1" a="2"/>' # preserved
>>> root = Element("root", b="1", a="2") # **extra parameter
>>> tostring(root)
b'<root a="2" b="1"/>' # not preserved
Attribute ordering and readability As the commenters have mentioned, attribute order has no semantic significance in XML, which is to say it doesn't change the meaning of an element:
<tag attr1="val1" attr2="val2"/>
<!-- means the same thing as: -->
<tag attr2="val2" attr1="val1"/>
There is an analogous characteristic in SQL, where column order doesn't change the meaning of a table definition. XML attributes and SQL columns are a set (not an ordered set), and so all that can "officially" be said about either one of those is whether the attribute or column is present in the set.
That said, it definitely makes a difference to human readability which order these things appear in and in situations where constructs like this are authored and appear in text (e.g. source code) and must be interpreted, a careful ordering makes a lot of sense to me.
Typical parser behavior
Any XML parser that treated attribute order as significant would be out of compliance with the XML standard. That doesn't mean it can't happen, but in my experience it is certainly unusual. Still, depending on the provenence of the tool you mention, it's a possibility that may be worth testing.
As far as I know, lxml
has no mechanism for specifying the order attributes appear in serialized XML, and I would be surprised if it did.
In order to test the behavior I'd be strongly inclined to just write a text-based template to generate enough XML to test it out:
id = 1
name = 'Development Signature'
puid = 'dev/1'
version = '1.0'
mimetype = 'text/x-test-signature'
template = ('<FileFormat ID="%d" Name="%s" PUID="%s" Version="%s" '
'MIMEType="%s">')
xml = template % (id, name, puid, version, mimetype)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With