Can ElementTree be told to preserve the order of attributes?

Context for this

I'm working with and on a particle physics tool that has a complex, but oddly limited configuration system based on xml files. Among the many things setup that way are the paths to various static data files. These paths are hardcoded into the existing xml and there are no facilities for setting or varying them based on environment variables, and in our local installation they are necessarily in a different place.

This isn't a disaster because the combined source- and build-control tool we're using allows us to shadow certain files with local copies. But even thought the data fields are static the xml isn't, so I've written a script for fixing the paths, but with the attribute rearrangement diffs between the local and master versions are harder to read than necessary.

This is my first time taking ElementTree for a spin (and only my fifth or sixth python project) so maybe I'm just doing it wrong.

Abstracted for simplicity the code looks like this:

tree = elementtree.ElementTree.parse(inputfile) i = tree.getiterator() for e in i:     e.text = filter(e.text) tree.write(outputfile)

Reasonable or dumb?

dmckee --- ex-moderator kitten

2 Answers

With help from @bobince's answer and these two (setting attribute order, overriding module methods)

I managed to get this monkey patched it's dirty and I'd suggest using another module that better handles this scenario but when that isn't a possibility:

# ======================================================================= # Monkey patch ElementTree import xml.etree.ElementTree as ET  def _serialize_xml(write, elem, encoding, qnames, namespaces):     tag = elem.tag     text = elem.text     if tag is ET.Comment:         write("<!--%s-->" % ET._encode(text, encoding))     elif tag is ET.ProcessingInstruction:         write("<?%s?>" % ET._encode(text, encoding))     else:         tag = qnames[tag]         if tag is None:             if text:                 write(ET._escape_cdata(text, encoding))             for e in elem:                 _serialize_xml(write, e, encoding, qnames, None)         else:             write("<" + tag)             items = elem.items()             if items or namespaces:                 if namespaces:                     for v, k in sorted(namespaces.items(),                                        key=lambda x: x[1]):  # sort on prefix                         if k:                             k = ":" + k                         write(" xmlns%s=\"%s\"" % (                             k.encode(encoding),                             ET._escape_attrib(v, encoding)                             ))                 #for k, v in sorted(items):  # lexical order                 for k, v in items: # Monkey patch                     if isinstance(k, ET.QName):                         k = k.text                     if isinstance(v, ET.QName):                         v = qnames[v.text]                     else:                         v = ET._escape_attrib(v, encoding)                     write(" %s=\"%s\"" % (qnames[k], v))             if text or len(elem):                 write(">")                 if text:                     write(ET._escape_cdata(text, encoding))                 for e in elem:                     _serialize_xml(write, e, encoding, qnames, None)                 write("</" + tag + ">")             else:                 write(" />")     if elem.tail:         write(ET._escape_cdata(elem.tail, encoding))  ET._serialize_xml = _serialize_xml  from collections import OrderedDict  class OrderedXMLTreeBuilder(ET.XMLTreeBuilder):     def _start_list(self, tag, attrib_in):         fixname = self._fixname         tag = fixname(tag)         attrib = OrderedDict()         if attrib_in:             for i in range(0, len(attrib_in), 2):                 attrib[fixname(attrib_in[i])] = self._fixtext(attrib_in[i+1])         return self._target.start(tag, attrib)  # =======================================================================

Then in your code:

tree = ET.parse(pathToFile, OrderedXMLTreeBuilder())

102

answered Sep 16 '22 16:09

SnellyBigoda

Nope. ElementTree uses a dictionary to store attribute values, so it's inherently unordered.

Even DOM doesn't guarantee you attribute ordering, and DOM exposes a lot more detail of the XML infoset than ElementTree does. (There are some DOMs that do offer it as a feature, but it's not standard.)

Can it be fixed? Maybe. Here's a stab at it that replaces the dictionary when parsing with an ordered one (collections.OrderedDict()).

from xml.etree import ElementTree from collections import OrderedDict import StringIO  class OrderedXMLTreeBuilder(ElementTree.XMLTreeBuilder):     def _start_list(self, tag, attrib_in):         fixname = self._fixname         tag = fixname(tag)         attrib = OrderedDict()         if attrib_in:             for i in range(0, len(attrib_in), 2):                 attrib[fixname(attrib_in[i])] = self._fixtext(attrib_in[i+1])         return self._target.start(tag, attrib)  >>> xmlf = StringIO.StringIO('<a b="c" d="e" f="g" j="k" h="i"/>')  >>> tree = ElementTree.ElementTree() >>> root = tree.parse(xmlf, OrderedXMLTreeBuilder()) >>> root.attrib OrderedDict([('b', 'c'), ('d', 'e'), ('f', 'g'), ('j', 'k'), ('h', 'i')])

Looks potentially promising.

>>> s = StringIO.StringIO() >>> tree.write(s) >>> s.getvalue() '<a b="c" d="e" f="g" h="i" j="k" />'

Bah, the serialiser outputs them in canonical order.

This looks like the line to blame, in ElementTree._write:

            items.sort() # lexical order

Subclassing or monkey-patching that is going to be annoying as it's right in the middle of a big method.

Unless you did something nasty like subclass OrderedDict and hack items to return a special subclass of list that ignores calls to sort(). Nah, probably that's even worse and I should go to bed before I come up with anything more horrible than that.

answered Sep 16 '22 16:09

bobince

Related questions
                            
                                How to make a 4d plot with matplotlib using arbitrary data
                            
                                get the last sunday and saturday's date in python
                            
                                How to display line numbers in IPython Notebook code cell by default
                            
                                how to recursively iterate over XML tags in Python using ElementTree?
                            
                                Celery Get List Of Registered Tasks
                            
                                Why does my Pandas DataFrame not display new order using `sort_values`?
                            
                                How to convert from UTM to LatLng in python or Javascript
                            
                                Finding words after keyword in python
                            
                                Is there a quick way to decrease the indentation of multiple lines in Python?
                            
                                Unzip all zipped files in a folder to that same folder using Python 2.7.5
                            
                                Python Pandas: Calculate moving average within group
                            
                                How do you specify a default for a Django ForeignKey Model or AdminModel field?
                            
                                In python, how to check if a date is valid?
                            
                                How does Python's comma operator work during assignment?
                            
                                Python: required kwarg, which exception to raise?
                            
                                out of memory issue in installing packages on Ubuntu server
                            
                                How to run different python versions in cmd [duplicate]
                            
                                Django: Difference between using server through manage.py and other servers like gunicorn etc. Which is better?
                            
                                How to turn off dropout for testing in Tensorflow?
                            
                                Keras: change learning rate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can ElementTree be told to preserve the order of attributes?

Tags:

python

xml

elementtree