By default, when you call ElementTree.parse(someXMLfile) the Python ElementTree library prefixes every parsed node with it's namespace URI in Clark's Notation:
{http://example.org/namespace/spec}mynode
This makes accessing specific nodes by name a huge pain later in the code.
I've read through the docs on ElementTree and namespaces and it looks like the iterparse()
function should allow me to alter the way the parser prefixes namespaces, but for the life of me I can't actually make it change the prefix. It seems like that may happen in the background before the ns-start event even fires as in this example:
for event, elem in iterparse(source):
if event == "start-ns":
namespaces.append(elem)
elif event == "end-ns":
namespaces.pop()
else:
...
How do I make it change the prefixing behavior and what is the proper thing to return when the function ends?
ElementTree is an important Python library that allows you to parse and navigate an XML document. Using ElementTree breaks down the XML document in a tree structure that is easy to work with.
When using prefixes in XML, a namespace for the prefix must be defined. The namespace can be defined by an xmlns attribute in the start tag of an element. The namespace declaration has the following syntax. xmlns:prefix="URI".
In XML, elements name are defined by the developer so there is a chance to conflict in name of the elements. To avoid these types of confliction we use XML Namespaces. We can say that XML Namespaces provide a method to avoid element name conflict.
You don't specifically need to use iterparse
. Instead, the following script:
from cStringIO import StringIO
import xml.etree.ElementTree as ET
NS_MAP = {
'http://www.red-dove.com/ns/abc' : 'rdc',
'http://www.adobe.com/2006/mxml' : 'mx',
'http://www.red-dove.com/ns/def' : 'oth',
}
DATA = '''<?xml version="1.0" encoding="utf-8"?>
<rdc:container xmlns:mx="http://www.adobe.com/2006/mxml"
xmlns:rdc="http://www.red-dove.com/ns/abc"
xmlns:oth="http://www.red-dove.com/ns/def">
<mx:Style>
<oth:style1/>
</mx:Style>
<mx:Style>
<oth:style2/>
</mx:Style>
<mx:Style>
<oth:style3/>
</mx:Style>
</rdc:container>'''
tree = ET.parse(StringIO(DATA))
some_node = tree.getroot().getchildren()[1]
print ET.fixtag(some_node.tag, NS_MAP)
some_node = some_node.getchildren()[0]
print ET.fixtag(some_node.tag, NS_MAP)
produces
('mx:Style', None) ('oth:style2', None)
Which shows how you can access the fully-qualified tag names of individual nodes in a parsed tree. You should be able to adapt this to your specific needs.
xml.etree.ElementTree doesn't appear to have fixtag, well, not according to the documentation. However I've looked at some source code for fixtag and you do:
import xml.etree.ElementTree as ET
for event, elem in ET.iterparse(inFile, events=("start", "end")):
namespace, looktag = string.split(elem.tag[1:], "}", 1)
You have the tag string in looktag, suitable for a lookup. The namespace is in namespace.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With