Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting a list of XML tags in file, using xml.etree.ElementTree

As mentioned, I need to get the list of XML tags in file, using library xml.etree.ElementTree.

I am aware that there are properties and methods like ETVar.child, ETVar.getroot(), ETVar.tag, ETVar.attrib.

But to be able to use them and get at least name of tags on level 2, I had to use nested for.

At the moment I have something like

for xmlChild in xmlRootTag:
    if xmlChild.tag:
        print(xmlChild.tag)

Goal would be to get a list of ALL, even deeply nested XML tags in file, eliminating duplicates.

For a better idea, I add the possible example of XML code:

<root>
 <firstLevel>
  <secondlevel level="2">
    <thirdlevel>
      <fourth>text</fourth>
      <fourth2>text</fourth>
    </thirdlevel>
  </secondlevel>
 </firstlevel>
</root>
like image 503
FanaticD Avatar asked Apr 13 '15 01:04

FanaticD


1 Answers

I've done more of a research on the subject and found out suitable solution. Since this could be a common task to do, I'll answer it, hence I believe it could help others.

What I was looking for was etree method iter.

import xml.etree.ElementTree as ET
# load and parse the file
xmlTree = ET.parse('myXMLFile.xml')

elemList = []

for elem in xmlTree.iter():
    elemList.append(elem.tag)

# now I remove duplicities - by convertion to set and back to list
elemList = list(set(elemList))

# Just printing out the result
print(elemList)

Important notes

  • xml.etree.ElemTree is a standard Python library
  • sample is written for Python v3.2.3
  • mechanic used to remove duplicities is based on converting to set, which allows only unique values and then converting back to list.
like image 52
FanaticD Avatar answered Sep 19 '22 13:09

FanaticD