Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Emitting namespace specifications with ElementTree in Python

I am trying to emit an XML file with element-tree that contains an XML declaration and namespaces. Here is my sample code:

from xml.etree import ElementTree as ET ET.register_namespace('com',"http://www.company.com") #some name  # build a tree structure root = ET.Element("STUFF") body = ET.SubElement(root, "MORE_STUFF") body.text = "STUFF EVERYWHERE!"  # wrap it in an ElementTree instance, and save as XML tree = ET.ElementTree(root)  tree.write("page.xml",            xml_declaration=True,            method="xml" ) 

However, neither the <?xml tag comes out nor any namespace/prefix information. I'm more than a little confused here.

like image 670
Paul Nathan Avatar asked Feb 14 '11 22:02

Paul Nathan


People also ask

What is ElementTree in Python?

ElementTree is an important Python library that allows you to parse and navigate an XML document. Using ElementTree breaks down the XML document in a tree structure that is easy to work with.

How do I read an XML string in Python?

There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.


2 Answers

Although the docs say otherwise, I only was able to get an <?xml> declaration by specifying both the xml_declaration and the encoding.

You have to declare nodes in the namespace you've registered to get the namespace on the nodes in the file. Here's a fixed version of your code:

from xml.etree import ElementTree as ET ET.register_namespace('com',"http://www.company.com") #some name  # build a tree structure root = ET.Element("{http://www.company.com}STUFF") body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF") body.text = "STUFF EVERYWHERE!"  # wrap it in an ElementTree instance, and save as XML tree = ET.ElementTree(root)  tree.write("page.xml",            xml_declaration=True,encoding='utf-8',            method="xml") 

Output (page.xml)

<?xml version='1.0' encoding='utf-8'?><com:STUFF xmlns:com="http://www.company.com"><com:MORE_STUFF>STUFF EVERYWHERE!</com:MORE_STUFF></com:STUFF> 

ElementTree doesn't pretty-print either. Here's pretty-printed output:

<?xml version='1.0' encoding='utf-8'?> <com:STUFF xmlns:com="http://www.company.com">     <com:MORE_STUFF>STUFF EVERYWHERE!</com:MORE_STUFF> </com:STUFF> 

You can also declare a default namespace and don't need to register one:

from xml.etree import ElementTree as ET  # build a tree structure root = ET.Element("{http://www.company.com}STUFF") body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF") body.text = "STUFF EVERYWHERE!"  # wrap it in an ElementTree instance, and save as XML tree = ET.ElementTree(root)  tree.write("page.xml",            xml_declaration=True,encoding='utf-8',            method="xml",default_namespace='http://www.company.com') 

Output (pretty-print spacing is mine)

<?xml version='1.0' encoding='utf-8'?> <STUFF xmlns="http://www.company.com">     <MORE_STUFF>STUFF EVERYWHERE!</MORE_STUFF> </STUFF> 
like image 116
Mark Tolonen Avatar answered Oct 11 '22 07:10

Mark Tolonen


I've never been able to get the <?xml tag out of the element tree libraries programatically so I'd suggest you try something like this.

from xml.etree import ElementTree as ET root = ET.Element("STUFF") root.set('com','http://www.company.com') body = ET.SubElement(root, "MORE_STUFF") body.text = "STUFF EVERYWHERE!"  f = open('page.xml', 'w') f.write('<?xml version="1.0" encoding="UTF-8"?>' + ET.tostring(root)) f.close() 

Non std lib python ElementTree implementations may have different ways to specify namespaces, so if you decide to move to lxml, the way you declare those will be different.

like image 40
Philip Southam Avatar answered Oct 11 '22 07:10

Philip Southam