Let's say I have an XML file as follows. <pre class="prettyprint"><code><A> <C>"blah"</C> <C>"blah"</C> <C>"blah"</C> <C>"blah"</C> </A> </code></pre> I need to read this file into a dictionary something like this. <pre class="prettyprint"> dict["A.B1.C1"] = "blah" dict["A.B1.C2"] = "blah" dict["A.B2.C1"] = "blah" dict["A.B2.C2"] = "blah" </pre> But the format of the dict doesn't matter, I just want to read the all the info into the variables of Python. The thing is that I don't know the structure of the XML, I just want to read all the info in a dictionary. Is there any way to do this with Python?

You can use untangle library in python. untangle.parse() converts an XML document into a Python object This takes an xml file as input and returns a python object which represents that xml document. Lets take following xml file as an example and name it as test_xml.xml <pre class="prettyprint"><code><A> <C>"blah1"</C> <C>"blah2"</C> <C>"blah3"</C> <C>"blah4"</C> </A> </code></pre> Now lets convert the above xml file into a python object to access the elements of xml file <pre class="prettyprint"><code>>>>import untangle >>>input_file = "/home/tests/test_xml.xml" #Full path to your xml file >>>obj = untangle.parse(input_file) >>>obj.A.B[0].C[0].cdata u'"blah1"' >>> obj.A.B[0].C[1].cdata u'"blah2"' >>> obj.A.B[1].C[0].cdata u'"blah3"' >>> obj.A.B[1].C[1].cdata u'"blah4"' </code></pre>

I usually use the lxml.objectify library for quick XML parsing. With your XML string, you can do: <pre class="prettyprint"><code>from lxml import objectify root = objectify.fromstring(xml_string) </code></pre> And then get individual elements using a dictionary interface: <pre class="prettyprint"><code>value = root["A"][0]["B"][0]["C"][0] </code></pre> Or, if you prefer: <pre class="prettyprint"><code>value = root.A[0].B[0].C[0] </code></pre>

How to get all the info in XML into dictionary with Python

Tags:

python

dictionary

xml

Let's say I have an XML file as follows.

<A>
 <B>
  <C>"blah"</C>
  <C>"blah"</C>
 </B>
 <B>
  <C>"blah"</C>
  <C>"blah"</C>
 </B>
</A>

I need to read this file into a dictionary something like this.

dict["A.B1.C1"] = "blah"
dict["A.B1.C2"] = "blah"
dict["A.B2.C1"] = "blah"
dict["A.B2.C2"] = "blah"

But the format of the dict doesn't matter, I just want to read the all the info into the variables of Python.

The thing is that I don't know the structure of the XML, I just want to read all the info in a dictionary.

Is there any way to do this with Python?

969

asked Jul 10 '10 01:07

prosseek

3 Answers

You can use untangle library in python. untangle.parse() converts an XML document into a Python object

This takes an xml file as input and returns a python object which represents that xml document.

Lets take following xml file as an example and name it as test_xml.xml

<A>
 <B>
  <C>"blah1"</C>
  <C>"blah2"</C>
 </B>
 <B>
  <C>"blah3"</C>
  <C>"blah4"</C>
 </B>
</A>

Now lets convert the above xml file into a python object to access the elements of xml file

>>>import untangle

>>>input_file = "/home/tests/test_xml.xml" #Full path to your xml file
>>>obj = untangle.parse(input_file)

>>>obj.A.B[0].C[0].cdata
u'"blah1"'
>>> obj.A.B[0].C[1].cdata
u'"blah2"'
>>> obj.A.B[1].C[0].cdata
u'"blah3"'
>>> obj.A.B[1].C[1].cdata
u'"blah4"'

answered Oct 17 '22 02:10

Ysh

I usually use the lxml.objectify library for quick XML parsing.

With your XML string, you can do:

from lxml import objectify
root = objectify.fromstring(xml_string)

And then get individual elements using a dictionary interface:

value = root["A"][0]["B"][0]["C"][0]

Or, if you prefer:

value = root.A[0].B[0].C[0]

answered Oct 17 '22 01:10

Lior

I usually parse XML using the ElementTree module on the standard library. It does not give you a dictionary, you get a much more useful DOM structure which allows you to iterate over each element for children.

from xml.etree import ElementTree as ET

xml = ET.parse("<path-to-xml-file")
root_element = xml.getroot()

for child in root_element:
   ...

If there is specific need to parse it to a dictionary, instead of getting the information you need from a DOM tree, a recursive function to build one from the root node would be something like:

def xml_dict(node, path="", dic =None):
    if dic == None:
        dic = {}
    name_prefix = path + ("." if path else "") + node.tag
    numbers = set()
    for similar_name in dic.keys():
        if similar_name.startswith(name_prefix):
            numbers.add(int (similar_name[len(name_prefix):].split(".")[0] ) )
    if not numbers:
        numbers.add(0)
    index = max(numbers) + 1
    name = name_prefix + str(index)
    dic[name] = node.text + "<...>".join(childnode.tail
                                         if childnode.tail is not None else
                                         "" for childnode in node)
    for childnode in node:
        xml_dict(childnode, name, dic)
    return dic

For the XML you list above this yields this dictionary:

{'A1': '\n \n <...>\n',
 'A1.B1': '\n  \n  <...>\n ',
 'A1.B1.C1': '"blah"',
 'A1.B1.C2': '"blah"',
 'A1.B2': '\n  \n  <...>\n ',
 'A1.B2.C1': '"blah"',
 'A1.B2.C2': '"blah"'}

(I find the DOM form more useful)

answered Oct 17 '22 00:10

jsbueno

Related questions
                            
                                python attribute lookup without any descriptor magic?
                            
                                make python replace un-encodable chars with a string by default
                            
                                PyQt documentation
                            
                                Create single python executable module
                            
                                How to read a CSV line with "?
                            
                                Unable to access ID property from a datastore entity
                            
                                Python and urllib
                            
                                how to set wxPython main frame bottom right on screen?
                            
                                Python2.6 Decimal to Octal
                            
                                Python 2.5.2: trying to open files recursively
                            
                                Quickest way to dump Python dictionary (dict) object to a MySQL table?
                            
                                How to replace by regular expression to lowercase in python
                            
                                Disadvantage of Python eggs?
                            
                                Python: How to round 123 to 100 instead of 100.0?
                            
                                Zlib not available in OS X?
                            
                                Parsing a string representing a float *with an exponent* in Python
                            
                                How can I freeze a dual-mode (GUI and console) application using cx_Freeze?
                            
                                Is there a 'hello world' website for django? OR (I've installed django, now what)?
                            
                                How to pad all the numbers in a string
                            
                                Undo and redo features in a Tkinter Text widget?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With