I'm looking for an XML to dictionary parser using ElementTree, I already found some but they are excluding the attributes, and in my case I have a lot of attributes.
The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.
There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.
The dict() is generally used to create a data dictionary to hold data in key-value pairs. It results in a good mapping between key and value pairs. In web applications, XML (Extensible Markup Language) is used in many fields. We can easily render data stored in XML.
The following XML-to-Python-dict snippet parses entities as well as attributes following this XML-to-JSON "specification":
from collections import defaultdict def etree_to_dict(t): d = {t.tag: {} if t.attrib else None} children = list(t) if children: dd = defaultdict(list) for dc in map(etree_to_dict, children): for k, v in dc.items(): dd[k].append(v) d = {t.tag: {k: v[0] if len(v) == 1 else v for k, v in dd.items()}} if t.attrib: d[t.tag].update(('@' + k, v) for k, v in t.attrib.items()) if t.text: text = t.text.strip() if children or t.attrib: if text: d[t.tag]['#text'] = text else: d[t.tag] = text return d
It is used:
from xml.etree import cElementTree as ET e = ET.XML(''' <root> <e /> <e>text</e> <e name="value" /> <e name="value">text</e> <e> <a>text</a> <b>text</b> </e> <e> <a>text</a> <a>text</a> </e> <e> text <a>text</a> </e> </root> ''') from pprint import pprint d = etree_to_dict(e) pprint(d)
The output of this example (as per above-linked "specification") should be:
{'root': {'e': [None, 'text', {'@name': 'value'}, {'#text': 'text', '@name': 'value'}, {'a': 'text', 'b': 'text'}, {'a': ['text', 'text']}, {'#text': 'text', 'a': 'text'}]}}
Not necessarily pretty, but it is unambiguous, and simpler XML inputs result in simpler JSON. :)
If you want to do the reverse, emit an XML string from a JSON/dict, you can use:
try: basestring except NameError: # python3 basestring = str def dict_to_etree(d): def _to_etree(d, root): if not d: pass elif isinstance(d, str): root.text = d elif isinstance(d, dict): for k,v in d.items(): assert isinstance(k, str) if k.startswith('#'): assert k == '#text' and isinstance(v, str) root.text = v elif k.startswith('@'): assert isinstance(v, str) root.set(k[1:], v) elif isinstance(v, list): for e in v: _to_etree(e, ET.SubElement(root, k)) else: _to_etree(v, ET.SubElement(root, k)) else: assert d == 'invalid type', (type(d), d) assert isinstance(d, dict) and len(d) == 1 tag, body = next(iter(d.items())) node = ET.Element(tag) _to_etree(body, node) return node print(ET.tostring(dict_to_etree(d)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With