Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python - lxml: enforcing a specific order for attributes

Tags:

python

xml

lxml

I have an XML writing script that outputs XML for a specific 3rd party tool.

I've used the original XML as a template to make sure that I'm building all the correct elements, but the final XML does not appear like the original.

I write the attributes in the same order, but lxml is writing them in its own order.

I'm not sure, but I suspect that the 3rd part tool expects attributes to appear in a specific order, and I'd like to resolve this issue so I can see if its the attrib order that making it fail, or something else.

Source element:

<FileFormat ID="1" Name="Development Signature" PUID="dev/1" Version="1.0" MIMEType="text/x-test-signature"> 

My source script:

sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", ID = str(db.ID), Name = db.name, PUID="fileSig/{}".format(str(db.ID)), Version = "", MIMEType = "")

My resultant XML:

<FileFormat MIMEType="" PUID="fileSig/19" Version="" Name="Printer Info File" ID="19">

Is there a way of constraining the order they are written?

like image 838
Jay Gattuso Avatar asked Feb 17 '13 04:02

Jay Gattuso


3 Answers

It looks like lxml serializes attributes in the order you set them:

>>> from lxml import etree as ET
>>> x = ET.Element("x")
>>> x.set('a', '1')
>>> x.set('b', '2')
>>> ET.tostring(x)
'<x a="1" b="2"/>'
>>> y= ET.Element("y")
>>> y.set('b', '2')
>>> y.set('a', '1')
>>> ET.tostring(y)
'<y b="2" a="1"/>'

Note that when you pass attributes using the ET.SubElement() constructor, Python constructs a dictionary of keyword arguments and passes that dictionary to lxml. This loses any ordering you had in the source file, since Python's dictionaries are unordered (or, rather, their order is determined by string hash values, which may differ from platform to platform or, in fact, from execution to execution).

like image 186
Marius Gedminas Avatar answered Nov 15 '22 19:11

Marius Gedminas


OrderedDict of attributes

As of lxml 3.3.3 (perhaps also in earlier versions) you can pass an OrderedDict of attributes to the lxml.etree.(Sub)Element constructor and the order will be preserved when using lxml.etree.tostring(root):

sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", OrderedDict([("ID",str(db.ID)), ("Name",db.name), ("PUID","fileSig/{}".format(str(db.ID))), ("Version",""), ("MIMEType","")]))

Note that the ElementTree API (xml.etree.ElementTree) does not preserve attribute order even if you provide an OrderedDict to the xml.etree.ElementTree.(Sub)Element constructor!

UPDATE: Also note that using the **extra parameter of the lxml.etree.(Sub)Element constructor for specifying attributes does not preserve attribute order:

>>> from lxml.etree import Element, tostring
>>> from collections import OrderedDict
>>> root = Element("root", OrderedDict([("b","1"),("a","2")])) # attrib parameter
>>> tostring(root)
b'<root b="1" a="2"/>' # preserved
>>> root = Element("root", b="1", a="2") # **extra parameter
>>> tostring(root)
b'<root a="2" b="1"/>' # not preserved
like image 39
Daniel K Avatar answered Nov 15 '22 18:11

Daniel K


Attribute ordering and readability As the commenters have mentioned, attribute order has no semantic significance in XML, which is to say it doesn't change the meaning of an element:

<tag attr1="val1" attr2="val2"/>

<!-- means the same thing as: -->

<tag attr2="val2" attr1="val1"/>

There is an analogous characteristic in SQL, where column order doesn't change the meaning of a table definition. XML attributes and SQL columns are a set (not an ordered set), and so all that can "officially" be said about either one of those is whether the attribute or column is present in the set.

That said, it definitely makes a difference to human readability which order these things appear in and in situations where constructs like this are authored and appear in text (e.g. source code) and must be interpreted, a careful ordering makes a lot of sense to me.

Typical parser behavior

Any XML parser that treated attribute order as significant would be out of compliance with the XML standard. That doesn't mean it can't happen, but in my experience it is certainly unusual. Still, depending on the provenence of the tool you mention, it's a possibility that may be worth testing.

As far as I know, lxml has no mechanism for specifying the order attributes appear in serialized XML, and I would be surprised if it did.

In order to test the behavior I'd be strongly inclined to just write a text-based template to generate enough XML to test it out:

id = 1
name = 'Development Signature'
puid = 'dev/1'
version = '1.0'
mimetype = 'text/x-test-signature'

template = ('<FileFormat ID="%d" Name="%s" PUID="%s" Version="%s" '
            'MIMEType="%s">')

xml = template % (id, name, puid, version, mimetype)
like image 22
scanny Avatar answered Nov 15 '22 20:11

scanny