The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.
An XML namespace is a collection of names that can be used as element or attribute names in an XML document. The namespace qualifies element names uniquely on the Web in order to avoid conflicts between elements with the same name.
etree. ElementTree module and Minidom (Minimal DOM Implementation). Parsing means to read information from a file and split it into pieces by identifying parts of that particular XML file.
Instead of modifying the XML document itself, it's best to parse it and then modify the tags in the result. This way you can handle multiple namespaces and namespace aliases:
from io import StringIO # for Python 2 import from StringIO instead
import xml.etree.ElementTree as ET
# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
prefix, has_namespace, postfix = el.tag.partition('}')
if has_namespace:
el.tag = postfix # strip all namespaces
root = it.root
This is based on the discussion here: http://bugs.python.org/issue18304
Update: rpartition
instead of partition
makes sure you get the tag name in postfix
even if there is no namespace. Thus you could condense it:
for _, el in it:
_, _, el.tag = el.tag.rpartition('}') # strip ns
If you remove the xmlns attribute from the xml before parsing it then there won't be a namespace prepended to each tag in the tree.
import re
xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)
The answers so far explicitely put the namespace value in the script. For a more generic solution, I would rather extract the namespace from the xml:
import re
def get_namespace(element):
m = re.match('\{.*\}', element.tag)
return m.group(0) if m else ''
And use it in find method:
namespace = get_namespace(tree.getroot())
print tree.find('./{0}parent/{0}version'.format(namespace)).text
Here's an extension to @nonagon answer (which removes namespace from tags) to also remove namespace from attributes:
import io
import xml.etree.ElementTree as ET
# instead of ET.fromstring(xml)
it = ET.iterparse(io.StringIO(xml))
for _, el in it:
if '}' in el.tag:
el.tag = el.tag.split('}', 1)[1] # strip all namespaces
for at in list(el.attrib.keys()): # strip namespaces of attributes too
if '}' in at:
newat = at.split('}', 1)[1]
el.attrib[newat] = el.attrib[at]
del el.attrib[at]
root = it.root
Obviously this is a permanent defacing of the XML but if that's acceptable because there are no non-unique tag names and because you won't be writing the file needing the original namespaces then this can make accessing it a lot easier
Improving on the answer by ericspod:
Instead of changing the parse mode globally we can wrap this in an object supporting the with construct.
from xml.parsers import expat
class DisableXmlNamespaces:
def __enter__(self):
self.oldcreate = expat.ParserCreate
expat.ParserCreate = lambda encoding, sep: self.oldcreate(encoding, None)
def __exit__(self, type, value, traceback):
expat.ParserCreate = self.oldcreate
This can then be used as follows
import xml.etree.ElementTree as ET
with DisableXmlNamespaces():
tree = ET.parse("test.xml")
The beauty of this way is that it does not change any behaviour for unrelated code outside the with block. I ended up creating this after getting errors in unrelated libraries after using the version by ericspod which also happened to use expat.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With