I have an XML file in the following format:
<doc>
<id name="X">
<type name="A">
<min val="100" id="80"/>
<max val="200" id="90"/>
</type>
<type name="B">
<min val="100" id="20"/>
<max val="20" id="90"/>
</type>
</id>
<type...>
</type>
</doc>
I would like to parse this document and build a hash table
{X: {"A": [(100,80), (200,90)], "B": [(100,20), (20,90)]}, Y: .....}
How would I do this in Python?
I disagree with the suggestion in other answers to use minidom -- that's a so-so Python adaptation of a standard originally conceived for other languages, usable but not a great fit. The recommended approach in modern Python is ElementTree.
The same interface is also implemented, faster, in third party module lxml, but unless you need blazing speed the version included with the Python standard library is fine (and faster than minidom anyway) -- the key point is to program to that interface, then you can always switch to a different implementation of the same interface in the future if you want to, with minimal changes to your own code.
For example, after the needed imports &c, the following code is a minimal implementation of your example (it does not verify that the XML is correct, just extracts the data assuming correctness -- adding various kinds of checks is pretty easy of course):
from xml.etree import ElementTree as et # or, import any other, faster version of ET
def xml2data(xmlfile):
tree = et.parse(xmlfile)
data = {}
for anid in tree.getroot().getchildren():
currdict = data[anid.get('name')] = {}
for atype in anid.getchildren():
currlist = currdict[atype.get('name')] = []
for c in atype.getchildren():
currlist.append((c.get('val'), c.get('id')))
return data
This produces your desired result given your sample input.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With