Using python, sort XML alphabetically except one element

Question

I'm trying to sort my XML alphabetically while ensuring that a specific element stays at the top. I have managed to sort it alphabetically, but I cannot get that element to stay. Here is what I have so far:

from lxml import etree

data = """
<Example xmlns="http://www.example.org">
    <E>
        <A>A</A>
        <B>B</B>
        <C>C</C>
    </E>
    <B>B</B>
    <D>D</D>
    <A>A</A>
    <C>C</C>
    <F>F</F>
</Example>
"""
doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'):
    parent[:] = sorted(parent,key=lambda x: x.tag)

print etree.tostring(doc,pretty_print=True)

The result from this is:

<Example xmlns="http://www.example.org">
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <E>
    <A>A</A>
    <B>B</B>
    <C>1</C>
  </E>
  <F>F</F>
</Example>

Is there anyway I can stop the <E></E> part and its contents from moving?

James · Accepted Answer

You can handle this in at least 2 ways. You could sort everything, and then force <E> to the top through a custom sorting function. Also, you could split the elements to-be-sorted out, sort them, and append them to the end of the non-sorted elements.

Custom sort:

Sorting for text occurs using progressive code points. You can get the code point for a single character using ord(). The lowest printed character is the tab. So for sorting we can tell python to sort all of the elements normally, unless the tag is <E>, then use a tab for sorting which will get sorted first.

There is some extra code to handle the namespace.

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))
ns = doc.nsmap

for parent in doc.xpath('//*[./*]'):
    parent[:] = sorted(parent,key=lambda x: x.tag if x.tag!='{'+ns[None]+'}E' else '	')

print(etree.tostring(doc,pretty_print=True).decode('ascii'))

<Example xmlns="http://www.example.org">
  <E>
    <A>A</A>
    <B>B</B>
    <C>C</C>
  </E>
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <F>F</F>
</Example>

Split, apply, combine

Here we split the parent into two lists, sort the second list, and then merge them.

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))
ns = doc.nsmap
for parent in doc.xpath('//*[./*]'):
    to_sort = (e for e in parent if e.tag!='{'+ns[None]+'}E')
    non_sort = (e for e in parent if e.tag=='{'+ns[None]+'}E')
    parent[:] = list(non_sort) + sorted(to_sort, key=lambda e: e.tag)
print(etree.tostring(doc,pretty_print=True).decode('ascii'))

<Example xmlns="http://www.example.org">
  <E>
    <A>A</A>
    <B>B</B>
    <C>C</C>
  </E>
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <F>F</F>
</Example>

PRMoureu · Answer

It could work with the following way, but it seems the simple tag cannot be reached, so it uses the long tag, including the xmlns part :

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

    for parent in doc.xpath('//*[./*]'):
        parent[:] = sorted(parent,
                           key=lambda x: (not x.tag =='{http://www.example.org}E', x.tag))

    print(etree.tounicode(doc,pretty_print=True))

This code will output :

<Example xmlns="http://www.example.org">
  <E>
    <A>A</A>
    <B>B</B>
    <C>C</C>
  </E>
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <F>F</F>
</Example>
   </Example>
'

The following code just outputs these long tags to understand what they look like :

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

    for parent in doc.xpath('//*[./*]'):
        for item in parent:
            print(item.tag)

    {http://www.example.org}E
    {http://www.example.org}B
    {http://www.example.org}D
    {http://www.example.org}A
    {http://www.example.org}C
    {http://www.example.org}F
    {http://www.example.org}A
    {http://www.example.org}B
    {http://www.example.org}C

Another way is to use an helper function to parse the tag to make it more readable :

def normalize(name):
    if name[0] == "{":
        uri, tag = name[1:].split("}")
        return tag
    else:
        return name

doc = etree.XML(data, etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'):
    parent[:] = sorted(parent,
                       key=lambda x: (not normalize(x.tag) == 'E', x.tag))

Using python, sort XML alphabetically except one element

Tags:

python

sorting

xml

alphabetical

Uwot12

2 Answers

Custom sort:

Split, apply, combine

James

PRMoureu

Recent Activity

Donate For Us

Using python, sort XML alphabetically except one element

Tags:

python

sorting

xml

alphabetical

Uwot12

2 Answers

Custom sort:

Split, apply, combine

James

PRMoureu

Related questions

Recent Activity

Donate For Us