Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alter namespace prefixing with ElementTree in Python

By default, when you call ElementTree.parse(someXMLfile) the Python ElementTree library prefixes every parsed node with it's namespace URI in Clark's Notation:

    {http://example.org/namespace/spec}mynode

This makes accessing specific nodes by name a huge pain later in the code.

I've read through the docs on ElementTree and namespaces and it looks like the iterparse() function should allow me to alter the way the parser prefixes namespaces, but for the life of me I can't actually make it change the prefix. It seems like that may happen in the background before the ns-start event even fires as in this example:

for event, elem in iterparse(source):
    if event == "start-ns":
        namespaces.append(elem)
    elif event == "end-ns":
        namespaces.pop()
    else:
        ...

How do I make it change the prefixing behavior and what is the proper thing to return when the function ends?

like image 598
Gabriel Hurley Avatar asked Aug 08 '09 20:08

Gabriel Hurley


People also ask

What is ElementTree in Python?

ElementTree is an important Python library that allows you to parse and navigate an XML document. Using ElementTree breaks down the XML document in a tree structure that is easy to work with.

How do you add namespace prefix to XML element in Python?

When using prefixes in XML, a namespace for the prefix must be defined. The namespace can be defined by an xmlns attribute in the start tag of an element. The namespace declaration has the following syntax. xmlns:prefix="URI".

How to solve name conflict in XML?

In XML, elements name are defined by the developer so there is a chance to conflict in name of the elements. To avoid these types of confliction we use XML Namespaces. We can say that XML Namespaces provide a method to avoid element name conflict.


2 Answers

You don't specifically need to use iterparse. Instead, the following script:

from cStringIO import StringIO
import xml.etree.ElementTree as ET

NS_MAP = {
    'http://www.red-dove.com/ns/abc' : 'rdc',
    'http://www.adobe.com/2006/mxml' : 'mx',
    'http://www.red-dove.com/ns/def' : 'oth',
}

DATA = '''<?xml version="1.0" encoding="utf-8"?>
<rdc:container xmlns:mx="http://www.adobe.com/2006/mxml"
                 xmlns:rdc="http://www.red-dove.com/ns/abc"
                 xmlns:oth="http://www.red-dove.com/ns/def">
  <mx:Style>
    <oth:style1/>
  </mx:Style>
  <mx:Style>
    <oth:style2/>
  </mx:Style>
  <mx:Style>
    <oth:style3/>
  </mx:Style>
</rdc:container>'''

tree = ET.parse(StringIO(DATA))
some_node = tree.getroot().getchildren()[1]
print ET.fixtag(some_node.tag, NS_MAP)
some_node = some_node.getchildren()[0]
print ET.fixtag(some_node.tag, NS_MAP)

produces

('mx:Style', None)
('oth:style2', None)

Which shows how you can access the fully-qualified tag names of individual nodes in a parsed tree. You should be able to adapt this to your specific needs.

like image 91
Vinay Sajip Avatar answered Sep 21 '22 06:09

Vinay Sajip


xml.etree.ElementTree doesn't appear to have fixtag, well, not according to the documentation. However I've looked at some source code for fixtag and you do:

import xml.etree.ElementTree as ET

for event, elem in ET.iterparse(inFile, events=("start", "end")):
    namespace, looktag = string.split(elem.tag[1:], "}", 1)

You have the tag string in looktag, suitable for a lookup. The namespace is in namespace.

like image 29
elves Avatar answered Sep 20 '22 06:09

elves