Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to flatten XML file in Python

Tags:

python

xml

Is there a library or mechanism I can use to flatten the XML file?

Existing:

<A>
    <B>
        <ConnectionType>a</ConnectionType>
        <StartTime>00:00:00</StartTime>
        <EndTime>00:00:00</EndTime>
        <UseDataDictionary>N</UseDataDictionary>

Desired:

A.B.ConnectionType = a
A.B.StartTime = 00:00:00
A.B.EndTime = 00:00:00
A.B.UseDataDictionary = N
like image 563
James Raitsev Avatar asked Aug 09 '16 13:08

James Raitsev


2 Answers

By using xmltodict to transform your XML file to a dictionary, in combination with this answer to flatten a dict, this should be possible.

Example:

# Original code: https://codereview.stackexchange.com/a/21035
from collections import OrderedDict

def flatten_dict(d):
    def items():
        for key, value in d.items():
            if isinstance(value, dict):
                for subkey, subvalue in flatten_dict(value).items():
                    yield key + "." + subkey, subvalue
            else:
                yield key, value

    return OrderedDict(items())

import xmltodict

# Convert to dict
with open('test.xml', 'rb') as f:
    xml_content = xmltodict.parse(f)

# Flatten dict
flattened_xml = flatten_dict(xml_content)

# Print in desired format
for k,v in flattened_xml.items():
    print('{} = {}'.format(k,v))

Output:

A.B.ConnectionType = a
A.B.StartTime = 00:00:00
A.B.EndTime = 00:00:00
A.B.UseDataDictionary = N
like image 182
DocZerø Avatar answered Sep 22 '22 16:09

DocZerø


This is not a complete implementation but you could take advantage of lxmls's getpath:

xml = """<A>
            <B>
               <ConnectionType>a</ConnectionType>
               <StartTime>00:00:00</StartTime>
               <EndTime>00:00:00</EndTime>
               <UseDataDictionary>N
               <UseDataDictionary2>G</UseDataDictionary2>
               </UseDataDictionary>
            </B>
       </A>"""


from lxml import etree
from StringIO import  StringIO
tree = etree.parse(StringIO(xml))

root = tree.getroot().tag
for node in tree.iter():
    for child in node.getchildren():
         if child.text.strip():
            print("{}.{} = {}".format(root, ".".join(tree.getelementpath(child).split("/")), child.text.strip()))

Which gives you:

A.B.ConnectionType = a
A.B.StartTime = 00:00:00
A.B.EndTime = 00:00:00
A.B.UseDataDictionary = N
A.B.UseDataDictionary.UseDataDictionary2 = G
like image 39
Padraic Cunningham Avatar answered Sep 23 '22 16:09

Padraic Cunningham