Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting xml to dictionary using ElementTree

I'm looking for an XML to dictionary parser using ElementTree, I already found some but they are excluding the attributes, and in my case I have a lot of attributes.

like image 423
OHLÁLÁ Avatar asked Oct 07 '11 07:10

OHLÁLÁ


People also ask

What is XML Etree ElementTree in Python?

The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.

How do you parse an XML string in Python?

There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.

What is Dict in XML?

The dict() is generally used to create a data dictionary to hold data in key-value pairs. It results in a good mapping between key and value pairs. In web applications, XML (Extensible Markup Language) is used in many fields. We can easily render data stored in XML.


1 Answers

The following XML-to-Python-dict snippet parses entities as well as attributes following this XML-to-JSON "specification":

from collections import defaultdict  def etree_to_dict(t):     d = {t.tag: {} if t.attrib else None}     children = list(t)     if children:         dd = defaultdict(list)         for dc in map(etree_to_dict, children):             for k, v in dc.items():                 dd[k].append(v)         d = {t.tag: {k: v[0] if len(v) == 1 else v                      for k, v in dd.items()}}     if t.attrib:         d[t.tag].update(('@' + k, v)                         for k, v in t.attrib.items())     if t.text:         text = t.text.strip()         if children or t.attrib:             if text:                 d[t.tag]['#text'] = text         else:             d[t.tag] = text     return d 

It is used:

from xml.etree import cElementTree as ET e = ET.XML(''' <root>   <e />   <e>text</e>   <e name="value" />   <e name="value">text</e>   <e> <a>text</a> <b>text</b> </e>   <e> <a>text</a> <a>text</a> </e>   <e> text <a>text</a> </e> </root> ''')  from pprint import pprint  d = etree_to_dict(e)  pprint(d) 

The output of this example (as per above-linked "specification") should be:

{'root': {'e': [None,                 'text',                 {'@name': 'value'},                 {'#text': 'text', '@name': 'value'},                 {'a': 'text', 'b': 'text'},                 {'a': ['text', 'text']},                 {'#text': 'text', 'a': 'text'}]}} 

Not necessarily pretty, but it is unambiguous, and simpler XML inputs result in simpler JSON. :)


Update

If you want to do the reverse, emit an XML string from a JSON/dict, you can use:

try:   basestring except NameError:  # python3   basestring = str  def dict_to_etree(d):     def _to_etree(d, root):         if not d:             pass         elif isinstance(d, str):             root.text = d         elif isinstance(d, dict):             for k,v in d.items():                 assert isinstance(k, str)                 if k.startswith('#'):                     assert k == '#text' and isinstance(v, str)                     root.text = v                 elif k.startswith('@'):                     assert isinstance(v, str)                     root.set(k[1:], v)                 elif isinstance(v, list):                     for e in v:                         _to_etree(e, ET.SubElement(root, k))                 else:                     _to_etree(v, ET.SubElement(root, k))         else:             assert d == 'invalid type', (type(d), d)     assert isinstance(d, dict) and len(d) == 1     tag, body = next(iter(d.items()))     node = ET.Element(tag)     _to_etree(body, node)     return node  print(ET.tostring(dict_to_etree(d))) 
like image 56
K3---rnc Avatar answered Sep 22 '22 04:09

K3---rnc