I need to load an XML file and convert the contents into an object-oriented Python structure. I want to take this:
<main> <object1 attr="name">content</object> </main>
And turn it into something like this:
main main.object1 = "content" main.object1.attr = "name"
The XML data will have a more complicated structure than that and I can't hard code the element names. The attribute names need to be collected when parsing and used as the object properties.
How can I convert XML data into a Python object?
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().
Python enables you to parse and modify XML document. In order to parse XML document you need to have the entire XML document in memory. In this tutorial, we will see how we can use XML minidom class in Python to load and parse XML file.
Import XML File to Excel. If you already have an XML file (either downloaded on your system or a link to it on the web), you can easily convert it into data in an Excel file.
It's worth looking at lxml.objectify
.
xml = """<main> <object1 attr="name">content</object1> <object1 attr="foo">contenbar</object1> <test>me</test> </main>""" from lxml import objectify main = objectify.fromstring(xml) main.object1[0] # content main.object1[1] # contenbar main.object1[0].get("attr") # name main.test # me
Or the other way around to build xml structures:
item = objectify.Element("item") item.title = "Best of python" item.price = 17.98 item.price.set("currency", "EUR") order = objectify.Element("order") order.append(item) order.item.quantity = 3 order.price = sum(item.price * item.quantity for item in order.item) import lxml.etree print(lxml.etree.tostring(order, pretty_print=True))
Output:
<order> <item> <title>Best of python</title> <price currency="EUR">17.98</price> <quantity>3</quantity> </item> <price>53.94</price> </order>
I've been recommending this more than once today, but try Beautiful Soup (easy_install BeautifulSoup).
from BeautifulSoup import BeautifulSoup xml = """ <main> <object attr="name">content</object> </main> """ soup = BeautifulSoup(xml) # look in the main node for object's with attr=name, optionally look up attrs with regex my_objects = soup.main.findAll("object", attrs={'attr':'name'}) for my_object in my_objects: # this will print a list of the contents of the tag print my_object.contents # if only text is inside the tag you can use this # print tag.string
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With