Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert xml to python dict

I'm trying to make a dict class to process an xml but get stuck, I really run out of ideas. If someone could guide on this subject would be great.

code developed so far:

class XMLResponse(dict):
    def __init__(self, xml):
        self.result = True
        self.message = ''
        pass

    def __setattr__(self, name, val):
        self[name] = val

    def __getattr__(self, name):
        if name in self:
            return self[name]
        return None

message="<?xml version="1.0"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"
XMLResponse(message)
like image 739
Alfredo Solís Avatar asked Jun 18 '13 19:06

Alfredo Solís


People also ask

How to convert XML string into a dictionary in Python?

Use the xmltodict Module to Convert XML String Into a Dictionary in Python Use the cElemenTree Library to Convert XML String Into Dictionary in Python XML is known as Extensible Markup Language. It is used to store and transport small to medium amounts of data, and it is also widely used for sharing structured information.

What is xmltodict in Python?

xmltodict: It is a Python module that makes working with XML feel like you are working with [JSON]. Run the following command in the terminal to install the module. pprint: The pprint module provides a capability to “pretty-print” arbitrary Python data structures in a well-formatted and more readable way. Import necessary module to working space.

How do I parse an XML file in Python?

The parse () function here parses the XML data to an ordered dictionary. cElementTree is an essential Python library allowing us to parse and navigate an XML document. With cElementTree, we can break down the XML document into a tree structure that is easy to work with.

How do I print a dictionary in XML?

Use xmltodict.parse () to parse the content from the variable and convert it into Dictionary. xmltodict.parse (xml_input, encoding=’utf-8′, expat=expat, process_namespaces=False, namespace_separator=’:’, **kwargs) Use pprint (pretty print) to print the dictionary in well-formatted and readable way.


2 Answers

You can make use of xmltodict module:

import xmltodict

message = """<?xml version="1.0"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"""
print xmltodict.parse(message)['note']

which produces an OrderedDict:

OrderedDict([(u'to', u'Tove'), (u'from', u'Jani'), (u'heading', u'Reminder'), (u'body', u"Don't forget me this weekend!")])

which can be converted to dict if order doesn't matter:

print dict(xmltodict.parse(message)['note'])

Prints:

{u'body': u"Don't forget me this weekend!", u'to': u'Tove', u'from': u'Jani', u'heading': u'Reminder'}
like image 199
alecxe Avatar answered Oct 06 '22 00:10

alecxe


You'd think that by now we'd have a good answer to this one, but we apparently didn't. After reviewing half of dozen of similar questions on stackoverflow, here is what worked for me:

from lxml import etree
# arrow is an awesome lib for dealing with dates in python
import arrow


# converts an etree to dict, useful to convert xml to dict
def etree2dict(tree):
    root, contents = recursive_dict(tree)
    return {root: contents}


def recursive_dict(element):
    if element.attrib and 'type' in element.attrib and element.attrib['type'] == "array":
        return element.tag, [(dict(map(recursive_dict, child)) or getElementValue(child)) for child in element]
    else:
        return element.tag, dict(map(recursive_dict, element)) or getElementValue(element)


def getElementValue(element):
    if element.text:
        if element.attrib and 'type' in element.attrib:
            attr_type = element.attrib.get('type')
            if attr_type == 'integer':
                return int(element.text.strip())
            if attr_type == 'float':
                return float(element.text.strip())
            if attr_type == 'boolean':
                return element.text.lower().strip() == 'true'
            if attr_type == 'datetime':
                return arrow.get(element.text.strip()).timestamp
        else:
            return element.text
    elif element.attrib:
        if 'nil' in element.attrib:
            return None
        else:
            return element.attrib
    else:
        return None

and this is how you use it:

from lxml import etree

message="""<?xml version="1.0"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"''
tree = etree.fromstring(message)
etree2dict(tree)

Hope it helps :-)

like image 22
Fred Avatar answered Oct 06 '22 00:10

Fred