Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert XML to dictionary in Python using lxml

There seem to be lots of solutions on StackOverflow for converting XML to a Python dictionary, but none of them generate the output I'm looking for. I have the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<status xmlns:mystatus="http://localhost/mystatus">
<section1
    mystatus:field1="data1"
    mystatus:field2="data2" />
<section2
    mystatus:lineA="outputA"
    mystatus:lineB="outputB" />
</status>

lxml has an elegantly simple solution for converting XML to a dictionary:

def recursive_dict(element):
 return element.tag, dict(map(recursive_dict, element)) or element.text

Unfortunately, I get:

('status', {'section2': None, 'section1': None})

instead of:

('status', {'section2': 
                       {'field1':'data1','field2':'data2'}, 
            'section1': 
                       {'lineA':'outputA','lineB':'outputB'}
            })

I can't figure out how to get my desired output without greatly complicating the recursive_dict() function.

I'm not tied to lxml, and I'm also fine with a different organization of the dictionary, as long as it gives me all the info in the xml. Thanks!

like image 365
proximous Avatar asked Oct 31 '14 01:10

proximous


People also ask

Is XML and lxml are same?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers. This is when the lxml library comes to play.

What is lxml package in Python?

The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API.

Why is lxml used?

lxml aims to provide a Pythonic API by following as much as possible the ElementTree API. We're trying to avoid inventing too many new APIs, or you having to learn new things -- XML is complicated enough.


2 Answers

Personally I like xmltodict from here. With pip you can install it like so pip install xmltodict.

Note that this actually creates OrderedDict objects. Example usage:

import xmltodict as xd

with open('test.xml','r') as f:
    d = xd.parse(f)
like image 172
TheSchwa Avatar answered Oct 11 '22 04:10

TheSchwa


I found a solution in this gist: https://gist.github.com/jacobian/795571

def elem2dict(node):
    """
    Convert an lxml.etree node tree into a dict.
    """
    result = {}

    for element in node.iterchildren():
        # Remove namespace prefix
        key = element.tag.split('}')[1] if '}' in element.tag else element.tag

        # Process element as tree element if the inner XML contains non-whitespace content
        if element.text and element.text.strip():
            value = element.text
        else:
            value = elem2dict(element)
        if key in result:

            
            if type(result[key]) is list:
                result[key].append(value)
            else:
                tempvalue = result[key].copy()
                result[key] = [tempvalue, value]
        else:
            result[key] = value
    return result
like image 32
guettli Avatar answered Oct 11 '22 02:10

guettli