Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get all the info in XML into dictionary with Python

Let's say I have an XML file as follows.

<A>
 <B>
  <C>"blah"</C>
  <C>"blah"</C>
 </B>
 <B>
  <C>"blah"</C>
  <C>"blah"</C>
 </B>
</A>

I need to read this file into a dictionary something like this.

dict["A.B1.C1"] = "blah"
dict["A.B1.C2"] = "blah"
dict["A.B2.C1"] = "blah"
dict["A.B2.C2"] = "blah"

But the format of the dict doesn't matter, I just want to read the all the info into the variables of Python.

The thing is that I don't know the structure of the XML, I just want to read all the info in a dictionary.

Is there any way to do this with Python?

like image 969
prosseek Avatar asked Jul 10 '10 01:07

prosseek


People also ask

How do I read XML content in Python?

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().

What method used to return a list of all the values in the dictionary?

In Python Dictionary, items() method is used to return the list with all dictionary keys with values.

What does Xmltodict parse do?

The xmltodict. parse() is a built-in function that convert the XML to Python dictionary. In the output, it can be seen that the XML is successfully converted into a dictionary.

What is the use of all () in dictionary?

Note: When used on a dictionary, the all() function checks if all the keys are true, not the values.


3 Answers

You can use untangle library in python. untangle.parse() converts an XML document into a Python object

This takes an xml file as input and returns a python object which represents that xml document.

Lets take following xml file as an example and name it as test_xml.xml

<A>
 <B>
  <C>"blah1"</C>
  <C>"blah2"</C>
 </B>
 <B>
  <C>"blah3"</C>
  <C>"blah4"</C>
 </B>
</A>  

Now lets convert the above xml file into a python object to access the elements of xml file

>>>import untangle

>>>input_file = "/home/tests/test_xml.xml" #Full path to your xml file
>>>obj = untangle.parse(input_file)

>>>obj.A.B[0].C[0].cdata
u'"blah1"'
>>> obj.A.B[0].C[1].cdata
u'"blah2"'
>>> obj.A.B[1].C[0].cdata
u'"blah3"'
>>> obj.A.B[1].C[1].cdata
u'"blah4"'
like image 57
Ysh Avatar answered Oct 17 '22 02:10

Ysh


I usually use the lxml.objectify library for quick XML parsing.

With your XML string, you can do:

from lxml import objectify
root = objectify.fromstring(xml_string)

And then get individual elements using a dictionary interface:

value = root["A"][0]["B"][0]["C"][0]

Or, if you prefer:

value = root.A[0].B[0].C[0]
like image 36
Lior Avatar answered Oct 17 '22 01:10

Lior


I usually parse XML using the ElementTree module on the standard library. It does not give you a dictionary, you get a much more useful DOM structure which allows you to iterate over each element for children.

from xml.etree import ElementTree as ET

xml = ET.parse("<path-to-xml-file")
root_element = xml.getroot()

for child in root_element:
   ...

If there is specific need to parse it to a dictionary, instead of getting the information you need from a DOM tree, a recursive function to build one from the root node would be something like:

def xml_dict(node, path="", dic =None):
    if dic == None:
        dic = {}
    name_prefix = path + ("." if path else "") + node.tag
    numbers = set()
    for similar_name in dic.keys():
        if similar_name.startswith(name_prefix):
            numbers.add(int (similar_name[len(name_prefix):].split(".")[0] ) )
    if not numbers:
        numbers.add(0)
    index = max(numbers) + 1
    name = name_prefix + str(index)
    dic[name] = node.text + "<...>".join(childnode.tail
                                         if childnode.tail is not None else
                                         "" for childnode in node)
    for childnode in node:
        xml_dict(childnode, name, dic)
    return dic

For the XML you list above this yields this dictionary:

{'A1': '\n \n <...>\n',
 'A1.B1': '\n  \n  <...>\n ',
 'A1.B1.C1': '"blah"',
 'A1.B1.C2': '"blah"',
 'A1.B2': '\n  \n  <...>\n ',
 'A1.B2.C1': '"blah"',
 'A1.B2.C2': '"blah"'}

(I find the DOM form more useful)

like image 32
jsbueno Avatar answered Oct 17 '22 00:10

jsbueno