There seem to be lots of solutions on StackOverflow for converting XML to a Python dictionary, but none of them generate the output I'm looking for. I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<status xmlns:mystatus="http://localhost/mystatus">
<section1
mystatus:field1="data1"
mystatus:field2="data2" />
<section2
mystatus:lineA="outputA"
mystatus:lineB="outputB" />
</status>
lxml has an elegantly simple solution for converting XML to a dictionary:
def recursive_dict(element):
return element.tag, dict(map(recursive_dict, element)) or element.text
Unfortunately, I get:
('status', {'section2': None, 'section1': None})
instead of:
('status', {'section2':
{'field1':'data1','field2':'data2'},
'section1':
{'lineA':'outputA','lineB':'outputB'}
})
I can't figure out how to get my desired output without greatly complicating the recursive_dict() function.
I'm not tied to lxml, and I'm also fine with a different organization of the dictionary, as long as it gives me all the info in the xml. Thanks!
lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers. This is when the lxml library comes to play.
The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API.
lxml aims to provide a Pythonic API by following as much as possible the ElementTree API. We're trying to avoid inventing too many new APIs, or you having to learn new things -- XML is complicated enough.
Personally I like xmltodict
from here. With pip you can install it like so pip install xmltodict
.
Note that this actually creates OrderedDict
objects. Example usage:
import xmltodict as xd
with open('test.xml','r') as f:
d = xd.parse(f)
I found a solution in this gist: https://gist.github.com/jacobian/795571
def elem2dict(node):
"""
Convert an lxml.etree node tree into a dict.
"""
result = {}
for element in node.iterchildren():
# Remove namespace prefix
key = element.tag.split('}')[1] if '}' in element.tag else element.tag
# Process element as tree element if the inner XML contains non-whitespace content
if element.text and element.text.strip():
value = element.text
else:
value = elem2dict(element)
if key in result:
if type(result[key]) is list:
result[key].append(value)
else:
tempvalue = result[key].copy()
result[key] = [tempvalue, value]
else:
result[key] = value
return result
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With