Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive XML parsing python using ElementTree

I'm trying to parse below XML using Python ElementTree to product output as below. I'm trying to write modules for top elements to print them. However It is slightly tricky as category element may or may not have property and cataegory element may have a category element inside.

I've referred to previous question in this topic, but they did not consist of nested elements with same name

My Code: http://pastebin.com/Fsv2Xzqf

work.xml:
<suite id="1" name="MainApplication">
<displayNameKey>my Application</displayNameKey>
<displayName>my Application</displayName>
<application id="2" name="Sub Application1">
<displayNameKey>sub Application1</displayNameKey>
<displayName>sub Application1</displayName>
<category id="2423" name="about">
<displayNameKey>subApp.about</displayNameKey>
<displayName>subApp.about</displayName>
<category id="2423" name="comms">
<displayNameKey>subApp.comms</displayNameKey>
<displayName>subApp.comms</displayName>
<property id="5909" name="copyright" type="string_property" width="40">
<value>2014</value>
</property>
<property id="5910" name="os" type="string_property" width="40">
<value>Linux 2.6.32-431.29.2.el6.x86_64</value>
</property>
</category>
<property id="5908" name="releaseNumber" type="string_property" width="40">
<value>9.1.0.3.0.54</value>
</property>
</category>
</application>
</suite>

Output should be as below:

Suite: MainApplication
    Application: Sub Application1
        Category: about
            property: releaseNumber | 9.1.0.3.0.54
            category: comms
                property: copyright | 2014
                property: os | Linux 2.6.32-431.29.2.el6.x86_64

Any pointers in right direction would be of help.

like image 809
Ara Avatar asked Jan 28 '15 14:01

Ara


2 Answers

import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='work.xml')

indent = 0
ignoreElems = ['displayNameKey', 'displayName']

def printRecur(root):
    """Recursively prints the tree."""
    if root.tag in ignoreElems:
        return
    print ' '*indent + '%s: %s' % (root.tag.title(), root.attrib.get('name', root.text))
    global indent
    indent += 4
    for elem in root.getchildren():
        printRecur(elem)
    indent -= 4

root = tree.getroot()
printRecur(root)

OUTPUT:

Suite: MainApplication
    Application: Sub Application1
        Category: about
            Category: comms
                Property: copyright
                    Value: 2014
                Property: os
                    Value: Linux 2.6.32-431.29.2.el6.x86_64
            Property: releaseNumber
                Value: 9.1.0.3.0.54

This is closest I could get in 5 minutes. You should just recursively call a processor function and that would take care. You can improve on from this point :)


You can also define handler function for each tag and put all of them in a dictionary for easy lookup. Then you can check if you have an appropriate handler function for that tag, then call that else just continue with blindly printing. For example:

HANDLERS = {
    'property': 'handle_property',
    <tag_name>: <handler_function>
}

def handle_property(root):
    """Takes property root element and prints the values."""
    data = ' '*indent + '%s: %s ' % (root.tag.title(), root.attrib['name'])
    values = []
    for elem in root.getchildren():
        if elem.tag == 'value':
            values.append(elem.text)
    print data + '| %s' % (', '.join(values))

# printRecur would get modified accordingly.
def printRecur(root):
    """Recursively prints the tree."""
    if root.tag in ignoreElems:
        return

    global indent
    indent += 4

    if root.tag in HANDLERS:
        handler = globals()[HANDLERS[root.tag]]
        handler(root)
    else:
        print ' '*indent + '%s: %s' % (root.tag.title(), root.attrib.get('name', root.text))
        for elem in root.getchildren():
            printRecur(elem)

    indent -= 4

Output with above:

Suite: MainApplication
    Application: Sub Application1
        Category: about
            Category: comms
                Property: copyright | 2014
                Property: os | Linux 2.6.32-431.29.2.el6.x86_64
            Property: releaseNumber | 9.1.0.3.0.54

I find this very useful rather than putting tons of if/else in the code.

like image 133
Jatin Kumar Avatar answered Nov 16 '22 01:11

Jatin Kumar


If you want a barebones XML recursive tree parser snippet:

from xml.etree import ElementTree
tree = ElementTree.parse('english_saheeh.xml')
root = tree.getroot()
def walk_tree_recursive(root):
    #do whatever with .tags here
    for child in root:
        walk_tree_recursive(child)
walk_tree_recursive(root)

like image 35
Omar Khan Avatar answered Nov 16 '22 02:11

Omar Khan