Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python ElementTree default namespace?

Tags:

Is there a way to define the default/unprefixed namespace in python ElementTree? This doesn't seem to work...

ns = {"":"http://maven.apache.org/POM/4.0.0"} pom = xml.etree.ElementTree.parse("pom.xml") print(pom.findall("version", ns)) 

Nor does this:

ns = {None:"http://maven.apache.org/POM/4.0.0"} pom = xml.etree.ElementTree.parse("pom.xml") print(pom.findall("version", ns)) 

This does, but then I have to prefix every element:

ns = {"mvn":"http://maven.apache.org/POM/4.0.0"} pom = xml.etree.ElementTree.parse("pom.xml") print(pom.findall("mvn:version", ns)) 

Using Python 3.5 on OSX.

EDIT: if the answer is "no", you can still get the bounty :-). I just want a definitive "no" from someone who's spent a lot of time using it.

like image 641
Robert Fraser Avatar asked Nov 30 '15 23:11

Robert Fraser


People also ask

What is ElementTree in Python?

ElementTree is an important Python library that allows you to parse and navigate an XML document. Using ElementTree breaks down the XML document in a tree structure that is easy to work with. When in doubt, print it out ( print(ET.

What is Namespace in XML file?

An XML namespace is a collection of names that can be used as element or attribute names in an XML document. The namespace qualifies element names uniquely on the Web in order to avoid conflicts between elements with the same name.


2 Answers

NOTE: for Python 3.8+ please see this answer.


There is no straight-forward way to handle the default namespaces transparently. Assigning the empty namespace a non-empty name is a common solution, as you've already mentioned:

ns = {"mvn":"http://maven.apache.org/POM/4.0.0"} pom = xml.etree.ElementTree.parse("pom.xml") print(pom.findall("mvn:version", ns)) 

Note that lxml.etree does not allow the use of empty namespaces explicitly. You would get:

ValueError: empty namespace prefix is not supported in ElementPath


You can though, make things simpler, by removing the default namespace definition while loading the XML input data:

import xml.etree.ElementTree as ET import re   with open("pom.xml") as f:     xmlstring = f.read()   # Remove the default namespace definition (xmlns="http://some/namespace") xmlstring = re.sub(r'\sxmlns="[^"]+"', '', xmlstring, count=1)   pom = ET.fromstring(xmlstring)  print(pom.findall("version")) 
like image 166
alecxe Avatar answered Sep 23 '22 08:09

alecxe


ElementTree in Python 3.8 allows empty string as a prefix, so you can declare:

ns = {'': 'http://maven.apache.org/POM/4.0.0'} 

and use that as the second arg in the find* methods.

Source: https://docs.python.org/3.8/library/xml.etree.elementtree.html?highlight=xml#xml.etree.ElementTree.Element.find

like image 42
delocalizer Avatar answered Sep 26 '22 08:09

delocalizer