Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elementtree setting attribute order

I am trying to write a python script to standardise generic XML files, used to configure websites and website forms. However to do this I would like to either maintain the original attribute ordering of the elements, or even better be able to rearrange them in a pre-defined way. Currently most xml parsers I have tried re-write the attribute order to be alpha-numeric. As these XML files are human read/written and maintained, this isn't too useful.

For example a generic element may look like this in the XML;

<Question QuestionRef="XXXXX" DataType="Integer" Text="Question Text" Availability="Shown" DefaultAnswer="X">

However once passed through elementtree and re-written to a new file this is changed to:

<Question Availability="Shown" DataType="Integer" DefaultAnswer="X" PartType="X" QuestionRef="XXXXX" Text="Question Text">

As the aim of the script is to standardise a large number of XML files in order to increase readability between colleagues and that the information contained within the element's attributes have varying levels of significance (Eg. QuestionRef is highly important), dicates that attributes need to be sensibly ordered.

I understand that python dicts (which attributes are stored in) are naturally unordered and XML specification states attribute ordering is insignificant, but this the human readability factor is the driving force behind the script.

In other questions (on Stack Overflow) similar to this one I have seen it remarked that pxdom can do this (question link: link), but I cannot find any mention of how it may to do this in pxdom documentation or using a google search. So is there some way to maintain an order of attributes or define it with current XML parsers? Preferably without resorting to hotpatching :)!

Any help anyone can provide would be greatly appreciated :).

like image 653
Piers Lillystone Avatar asked Jan 10 '13 12:01

Piers Lillystone


1 Answers

Apply monkey patch as mentioned below::
in ElementTree.py file, there is a function named as _serialize_xml;
in this function; apply the below mentioned patch;

        ##for k, v in sorted(items):  # remove the sorted here
        for k, v in items:
            if isinstance(k, QName):
                k = k.text
            if isinstance(v, QName):
                v = qnames[v.text]
            else:
                v = _escape_attrib(v, encoding)
            write(" %s=\"%s\"" % (qnames[k], v))

here; remove the sorted(items) and make it just items like i have done above.

Also to disable sorting based on namespace(because in above patch; sorting is still present when namespace is present for xml attribute; otherwise if namespace is not present; then above is working fine); so to do that, replace all {} with collections.OrderedDict() from ElementTree.py

Now you have all attributes in a order as you have added them to that xml element.

Before doing all of above; read the copyright message by Fredrik Lundh that is present in ElementTree.py

like image 61
namit Avatar answered Nov 15 '22 19:11

namit