Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using python, sort XML alphabetically except one element

I'm trying to sort my XML alphabetically while ensuring that a specific element stays at the top. I have managed to sort it alphabetically, but I cannot get that element to stay. Here is what I have so far:

from lxml import etree

data = """
<Example xmlns="http://www.example.org">
    <E>
        <A>A</A>
        <B>B</B>
        <C>C</C>
    </E>
    <B>B</B>
    <D>D</D>
    <A>A</A>
    <C>C</C>
    <F>F</F>
</Example>
"""
doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'):
    parent[:] = sorted(parent,key=lambda x: x.tag)

print etree.tostring(doc,pretty_print=True)

The result from this is:

<Example xmlns="http://www.example.org">
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <E>
    <A>A</A>
    <B>B</B>
    <C>1</C>
  </E>
  <F>F</F>
</Example>

Is there anyway I can stop the <E></E> part and its contents from moving?

like image 322
Uwot12 Avatar asked Jun 18 '26 11:06

Uwot12


2 Answers

You can handle this in at least 2 ways. You could sort everything, and then force <E> to the top through a custom sorting function. Also, you could split the elements to-be-sorted out, sort them, and append them to the end of the non-sorted elements.

Custom sort:

Sorting for text occurs using progressive code points. You can get the code point for a single character using ord(). The lowest printed character is the tab. So for sorting we can tell python to sort all of the elements normally, unless the tag is <E>, then use a tab for sorting which will get sorted first.

There is some extra code to handle the namespace.

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))
ns = doc.nsmap

for parent in doc.xpath('//*[./*]'):
    parent[:] = sorted(parent,key=lambda x: x.tag if x.tag!='{'+ns[None]+'}E' else '\t')

print(etree.tostring(doc,pretty_print=True).decode('ascii'))

<Example xmlns="http://www.example.org">
  <E>
    <A>A</A>
    <B>B</B>
    <C>C</C>
  </E>
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <F>F</F>
</Example>

Split, apply, combine

Here we split the parent into two lists, sort the second list, and then merge them.

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))
ns = doc.nsmap
for parent in doc.xpath('//*[./*]'):
    to_sort = (e for e in parent if e.tag!='{'+ns[None]+'}E')
    non_sort = (e for e in parent if e.tag=='{'+ns[None]+'}E')
    parent[:] = list(non_sort) + sorted(to_sort, key=lambda e: e.tag)
print(etree.tostring(doc,pretty_print=True).decode('ascii'))

<Example xmlns="http://www.example.org">
  <E>
    <A>A</A>
    <B>B</B>
    <C>C</C>
  </E>
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <F>F</F>
</Example>
like image 84
James Avatar answered Jun 20 '26 02:06

James


It could work with the following way, but it seems the simple tag cannot be reached, so it uses the long tag, including the xmlns part :

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

    for parent in doc.xpath('//*[./*]'):
        parent[:] = sorted(parent,
                           key=lambda x: (not x.tag =='{http://www.example.org}E', x.tag))

    print(etree.tounicode(doc,pretty_print=True))

This code will output :

<Example xmlns="http://www.example.org">
  <E>
    <A>A</A>
    <B>B</B>
    <C>C</C>
  </E>
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <F>F</F>
</Example>
   </Example>\n'

The following code just outputs these long tags to understand what they look like :

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

    for parent in doc.xpath('//*[./*]'):
        for item in parent:
            print(item.tag)

    {http://www.example.org}E
    {http://www.example.org}B
    {http://www.example.org}D
    {http://www.example.org}A
    {http://www.example.org}C
    {http://www.example.org}F
    {http://www.example.org}A
    {http://www.example.org}B
    {http://www.example.org}C

Another way is to use an helper function to parse the tag to make it more readable :

def normalize(name):
    if name[0] == "{":
        uri, tag = name[1:].split("}")
        return tag
    else:
        return name

doc = etree.XML(data, etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'):
    parent[:] = sorted(parent,
                       key=lambda x: (not normalize(x.tag) == 'E', x.tag))
like image 44
PRMoureu Avatar answered Jun 20 '26 01:06

PRMoureu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!