This XML file is named example.xml
:
<?xml version="1.0"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>14.0.0</modelVersion> <groupId>.com.foobar.flubber</groupId> <artifactId>uberportalconf</artifactId> <version>13-SNAPSHOT</version> <packaging>pom</packaging> <name>Environment for UberPortalConf</name> <description>This is the description</description> <properties> <birduberportal.version>11</birduberportal.version> <promotiondevice.version>9</promotiondevice.version> <foobarportal.version>6</foobarportal.version> <eventuberdevice.version>2</eventuberdevice.version> </properties> <!-- A lot more here, but as it is irrelevant for the problem I have removed it --> </project>
If I load example.xml and parse it with ElementTree I can see its namespace is http://maven.apache.org/POM/4.0.0
.
>>> from xml.etree import ElementTree >>> tree = ElementTree.parse('example.xml') >>> print tree.getroot() <Element '{http://maven.apache.org/POM/4.0.0}project' at 0x26ee0f0>
I have not found a method to call to get just the namespace from an Element
without resorting to parsing the str(an_element)
of an Element. It seems like there got to be a better way.
There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().
The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.
This is a perfect task for a regular expression.
import re def namespace(element): m = re.match(r'\{.*\}', element.tag) return m.group(0) if m else ''
The namespace should be in Element.tag
right before the "actual" tag:
>>> root = tree.getroot() >>> root.tag '{http://maven.apache.org/POM/4.0.0}project'
To know more about namespaces, take a look at ElementTree: Working with Namespaces and Qualified Names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With