Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - How to determine hierarchy level of parsed XML elements?

I am trying to parse elements with certain tag from XML file with Python and generate output excel document, which would contain elements and also preserve their hierarchy.

My problem is that I cannot figure out how deeply nested each element (over which parser iterates) is.

XML sample extract (3 elements, they can be nested arbitrarily within themselves):

<A>
   <B>
      <C>
      </C>
   </B>
</A>
<B>
    <A>
    </A>
</B>

Following code, using ElementTree, worked well to iterate over elements. But I think ElementTree is not capable determining how deeply each element is nested. See below:

import xml.etree.ElementTree as ET

root = ET.parse('XML_file.xml')
tree = root.getroot()
for element in tree.iter():
    if element.tag in ("A","B","C"):
        print(element.tag)

This will get me the list of elements A,B,C in right order. But I need to print them out with information of their level,

So not only:

A
B
C
B
A

But something like:

A
--B
----C
B
--A

To be able to do this, I need to get the level of each element. Is there any suitable parser for python which can easily do this? I would imagine something like "element.hierarchyLevel" which would return some Integer index...

like image 738
VojtaBurian Avatar asked Apr 01 '13 17:04

VojtaBurian


People also ask

What is Etree in Python?

etree. ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.

What is the role of parse () function in ElementTree?

Parsing a file object or a filename with parse() returns an instance of the ET. ElementTree class, which represents the whole element hierarchy. On the other hand, parsing a string with fromstring() will return the specific root ET.

How does Python handle XML?

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().


1 Answers

Try using a recursive function, that keeps track of your "level".

import xml.etree.ElementTree as ET

def perf_func(elem, func, level=0):
    func(elem,level)
    for child in elem.getchildren():
        perf_func(child, func, level+1)

def print_level(elem,level):
    print '-'*level+elem.tag

root = ET.parse('XML_file.xml')
perf_func(root.getroot(), print_level)
like image 135
pradyunsg Avatar answered Nov 02 '22 10:11

pradyunsg