I am trying to parse elements with certain tag from XML file with Python and generate output excel document, which would contain elements and also preserve their hierarchy.
My problem is that I cannot figure out how deeply nested each element (over which parser iterates) is.
XML sample extract (3 elements, they can be nested arbitrarily within themselves):
<A>
<B>
<C>
</C>
</B>
</A>
<B>
<A>
</A>
</B>
Following code, using ElementTree, worked well to iterate over elements. But I think ElementTree is not capable determining how deeply each element is nested. See below:
import xml.etree.ElementTree as ET
root = ET.parse('XML_file.xml')
tree = root.getroot()
for element in tree.iter():
if element.tag in ("A","B","C"):
print(element.tag)
This will get me the list of elements A,B,C in right order. But I need to print them out with information of their level,
So not only:
A
B
C
B
A
But something like:
A
--B
----C
B
--A
To be able to do this, I need to get the level of each element. Is there any suitable parser for python which can easily do this? I would imagine something like "element.hierarchyLevel" which would return some Integer index...
etree. ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.
Parsing a file object or a filename with parse() returns an instance of the ET. ElementTree class, which represents the whole element hierarchy. On the other hand, parsing a string with fromstring() will return the specific root ET.
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().
Try using a recursive function, that keeps track of your "level".
import xml.etree.ElementTree as ET
def perf_func(elem, func, level=0):
func(elem,level)
for child in elem.getchildren():
perf_func(child, func, level+1)
def print_level(elem,level):
print '-'*level+elem.tag
root = ET.parse('XML_file.xml')
perf_func(root.getroot(), print_level)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With