Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting child elements with lxml based on attribute value

I'm trying to sort some child elements in a document based on an attribute value, and while the actual sorted function seems to be working, the splicing of the newly sorted elements doesn't seem to be.

from lxml import etree

def getkey(elem):
    # Used for sorting elements by @LIN.
    # returns a tuple of ints from the exploded @LIN value
    # '1.0' -> (1,0)
    # '1.0.1' -> (1,0,1)
    return tuple([int(x) for x in elem.get('LIN').split('.')])

xml_str = """<Interface>
                <Header></Header>
                <PurchaseOrder>
                    <LineItems>
                        <Line LIN="2.0"></Line>
                        <Line LIN="3.0"></Line>
                        <Line LIN="1.0"></Line>
                    </LineItems>
                </PurchaseOrder>
            </Interface>"""

root = etree.fromstring(xml_str)
lines = root.findall("PurchaseOrder/LineItems/Line")
lines[:] = sorted(lines, key=getkey)
res_lines = [x.get('LIN') for x in lines]
print res_lines

print etree.tostring(root, pretty_print=True)

When I execute the above code I will see the lines list did sort correctly as it prints ['1.0', '2.0', '3.0']. However the XML tree isn't updated as tostring() prints out the below.

<Interface>
  <Header/>
  <PurchaseOrder>
    <LineItems>
      <Line LIN="2.0"/>
      <Line LIN="3.0"/>
      <Line LIN="1.0"/>
    </LineItems>
  </PurchaseOrder>
</Interface>

I got the idea for how to sort from http://effbot.org/zone/element-sort.htm, which says that the splicing should be all that I need to update the element order, but that doens't seem to be the case. I realise lxml is not 100% compatible with elementtree, so as a sanity check I replaced the lxml import with elementtree and got the exact same results.

like image 226
Ryan Parrish Avatar asked Mar 15 '16 17:03

Ryan Parrish


1 Answers

This will sort and write the output:

import xml.etree.ElementTree as ET

tree = ET.parse("in.xml")

def getkey(elem):
    # Used for sorting elements by @LIN.
    # returns a tuple of ints from the exploded @LIN value
    # '1.0' -> (1,0)
    # '1.0.1' -> (1,0,1)
    return float(elem.get('LIN'))

container = tree.find("PurchaseOrder/LineItems")

container[:] = sorted(container, key=getkey)

tree.write("new.xml")

Or using your own code to print:

import xml.etree.ElementTree as ET

tree = ET.fromstring(xml_str)

def getkey(elem):
    # Used for sorting elements by @LIN.
    # returns a tuple of ints from the exploded @LIN value
    # '1.0' -> (1,0)
    # '1.0.1' -> (1,0,1)
    return float(elem.get('LIN'))

root = etree.fromstring(xml_str)
lines = root.find("PurchaseOrder/LineItems")
lines[:] = sorted(lines, key=getkey)

Output:

In [12]: print (etree.tostring(root, pretty_print=True))
        <Interface>
            <Header/>
                <PurchaseOrder>
                    <LineItems>
                        <Line LIN="1.0"/>
                    <Line LIN="2.0"/>
                        <Line LIN="3.0"/>
                        </LineItems>
                </PurchaseOrder>
            </Interface>

The key is root.find("PurchaseOrder/LineItems"), you want to find the LineItems element and sort that.

like image 92
Padraic Cunningham Avatar answered Oct 18 '22 13:10

Padraic Cunningham