Let's say I have an XML file as follows.
<A>
<B>
<C>"blah"</C>
<C>"blah"</C>
</B>
<B>
<C>"blah"</C>
<C>"blah"</C>
</B>
</A>
I need to read this file into a dictionary something like this.
dict["A.B1.C1"] = "blah" dict["A.B1.C2"] = "blah" dict["A.B2.C1"] = "blah" dict["A.B2.C2"] = "blah"
But the format of the dict doesn't matter, I just want to read the all the info into the variables of Python.
The thing is that I don't know the structure of the XML, I just want to read all the info in a dictionary.
Is there any way to do this with Python?
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().
In Python Dictionary, items() method is used to return the list with all dictionary keys with values.
The xmltodict. parse() is a built-in function that convert the XML to Python dictionary. In the output, it can be seen that the XML is successfully converted into a dictionary.
Note: When used on a dictionary, the all() function checks if all the keys are true, not the values.
You can use untangle library in python. untangle.parse() converts an XML document into a Python object
This takes an xml file as input and returns a python object which represents that xml document.
Lets take following xml file as an example and name it as test_xml.xml
<A>
<B>
<C>"blah1"</C>
<C>"blah2"</C>
</B>
<B>
<C>"blah3"</C>
<C>"blah4"</C>
</B>
</A>
Now lets convert the above xml file into a python object to access the elements of xml file
>>>import untangle
>>>input_file = "/home/tests/test_xml.xml" #Full path to your xml file
>>>obj = untangle.parse(input_file)
>>>obj.A.B[0].C[0].cdata
u'"blah1"'
>>> obj.A.B[0].C[1].cdata
u'"blah2"'
>>> obj.A.B[1].C[0].cdata
u'"blah3"'
>>> obj.A.B[1].C[1].cdata
u'"blah4"'
I usually use the lxml.objectify library for quick XML parsing.
With your XML string, you can do:
from lxml import objectify
root = objectify.fromstring(xml_string)
And then get individual elements using a dictionary interface:
value = root["A"][0]["B"][0]["C"][0]
Or, if you prefer:
value = root.A[0].B[0].C[0]
I usually parse XML using the ElementTree module on the standard library. It does not give you a dictionary, you get a much more useful DOM structure which allows you to iterate over each element for children.
from xml.etree import ElementTree as ET
xml = ET.parse("<path-to-xml-file")
root_element = xml.getroot()
for child in root_element:
...
If there is specific need to parse it to a dictionary, instead of getting the information you need from a DOM tree, a recursive function to build one from the root node would be something like:
def xml_dict(node, path="", dic =None):
if dic == None:
dic = {}
name_prefix = path + ("." if path else "") + node.tag
numbers = set()
for similar_name in dic.keys():
if similar_name.startswith(name_prefix):
numbers.add(int (similar_name[len(name_prefix):].split(".")[0] ) )
if not numbers:
numbers.add(0)
index = max(numbers) + 1
name = name_prefix + str(index)
dic[name] = node.text + "<...>".join(childnode.tail
if childnode.tail is not None else
"" for childnode in node)
for childnode in node:
xml_dict(childnode, name, dic)
return dic
For the XML you list above this yields this dictionary:
{'A1': '\n \n <...>\n',
'A1.B1': '\n \n <...>\n ',
'A1.B1.C1': '"blah"',
'A1.B1.C2': '"blah"',
'A1.B2': '\n \n <...>\n ',
'A1.B2.C1': '"blah"',
'A1.B2.C2': '"blah"'}
(I find the DOM form more useful)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With