Reading Maven Pom xml in Python

Question

I have a pom file that has the following defined:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

<modelVersion>4.0.0</modelVersion>
<groupId>org.welsh</groupId>
<artifactId>my-site</artifactId>
<version>1.0.0</version>
<packaging>pom</packaging>

<profiles>
    <profile>
        <build>
            <plugins>
                <plugin>
                    <groupId>org.welsh.utils</groupId>
                    <artifactId>site-tool</artifactId>
                    <version>1.0</version>
                    <executions>
                        <execution>
                            <configuration>
                                <mappings>
                                    <property>
                                        <name>homepage</name>
                                        <value>/content/homepage</value>
                                    </property>
                                    <property>
                                        <name>assets</name>
                                        <value>/content/assets</value>
                                    </property>
                                </mappings>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    </profile>
</profiles>
</project>

And I am looking to build a dictionary off the name & value elements under property under the mappings element.

So what I'm trying to figure out how to get all possible mappings elements (Incase of multiple build profiles) so I can get all property elements under it and from reading about Supported XPath syntax the following should print out all possible text/value elements:

import xml.etree.ElementTree as xml

pomFile = xml.parse('pom.xml')
root = pomFile.getroot()

for mapping in root.findall('*/mappings'):
    for prop in mapping.findall('.//property'):
        logging.info(prop.find('name').text + " => " + prop.find('value').text)

Which is returning nothing. I tried just printing out all the mappings elements and get:

>>> print root.findall('*/mappings')
[]

And when I print out the everything from root I get:

>>> print root.findall('*')
[<Element '{http://maven.apache.org/POM/4.0.0}modelVersion' at 0x10b38bd50>, <Element '{http://maven.apache.org/POM/4.0.0}groupId' at 0x10b38bd90>, <Element '{http://maven.apache.org/POM/4.0.0}artifactId' at 0x10b38bf10>, <Element '{http://maven.apache.org/POM/4.0.0}version' at 0x10b3900d0>, <Element '{http://maven.apache.org/POM/4.0.0}packaging' at 0x10b390110>, <Element '{http://maven.apache.org/POM/4.0.0}name' at 0x10b390150>, <Element '{http://maven.apache.org/POM/4.0.0}properties' at 0x10b390190>, <Element '{http://maven.apache.org/POM/4.0.0}build' at 0x10b390310>, <Element '{http://maven.apache.org/POM/4.0.0}profiles' at 0x10b390390>]

Which made me try to print:

>>> print root.findall('*/{http://maven.apache.org/POM/4.0.0}mappings')
[]

But that's not working either.

Any suggestions would be great.

Thanks,

JojOatXGME · Accepted Answer

The main issues of the code in the question are

that it doesn't specify namespaces, and
that it uses */ instead of // which only matches direct children.

As you can see at the top of the XML file, Maven uses the namespace http://maven.apache.org/POM/4.0.0. The attribute xmlns in the root node defines the default namespace. The attribute xmlns:xsi defines a namespace that is only used for xsi:schemaLocation.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

To specify tags like profile in methods like find, you have to specify the namespace as well. For example, you could write the following to find all profile-tags.

import xml.etree as xml

pom = xml.parse('pom.xml')
for profile in pom.findall('//{http://maven.apache.org/POM/4.0.0}profile'):
    print(repr(profile))

Also note that I'm using //. Using */ would have the same result for your specific xml file above. However, it would not work for other tags like mappings. Since * represents only one level, */child can be expanded to parent/tag or xyz/tag but not to xyz/parent/tag.

Now, you should be able to come up with something like this to find all mappings:

pom = xml.parse('pom.xml')
map = {}
for mapping in pom.findall('//{http://maven.apache.org/POM/4.0.0}mappings'
                           '/{http://maven.apache.org/POM/4.0.0}property'):
    name  = mapping.find('{http://maven.apache.org/POM/4.0.0}name').text
    value = mapping.find('{http://maven.apache.org/POM/4.0.0}value').text
    map[name] = value

Specifying the namespaces like this is quite verbose. To make it easier to read, you can define a namespace map and pass it as second argument to find and findall:

# ...
nsmap = {'m': 'http://maven.apache.org/POM/4.0.0'}
for mapping in pom.findall('//m:mappings/m:property', nsmap):
    name  = mapping.find('m:name', nsmap).text
    value = mapping.find('m:value', nsmap).text
    map[name] = value

Welsh · Answer

Ok, found out that when I remove maven stuff from the project element so its just <project> I can do this:

for mapping in root.findall('*//mappings'):
    logging.info(mapping)
    for prop in mapping.findall('./property'):
        logging.info(prop.find('name').text + " => " + prop.find('value').text)

Which would result in:

INFO:root:<Element 'mappings' at 0x10d72d350>
INFO:root:homepage => /content/homepage
INFO:root:assets => /content/assets

However, if I leave the Maven stuff in at the top I can do this:

for mapping in root.findall('*//{http://maven.apache.org/POM/4.0.0}mappings'):
    logging.info(mapping)
    for prop in mapping.findall('./{http://maven.apache.org/POM/4.0.0}property'):
        logging.info(prop.find('{http://maven.apache.org/POM/4.0.0}name').text + " => " + prop.find('{http://maven.apache.org/POM/4.0.0}value').text)

Which results in:

INFO:root:<Element '{http://maven.apache.org/POM/4.0.0}mappings' at 0x10aa7f310>
INFO:root:homepage => /content/homepage
INFO:root:assets => /content/assets

However, I'd love to be able to figure out how to avoid having to account for the maven stuff since it locks me into this one format.

EDIT:

Ok, I managed to get something a bit more verbose:

import xml.etree.ElementTree as xml

def getMappingsNode(node, nodeName):
    if node.findall('*'):
        for n in node.findall('*'):
            if nodeName in n.tag:
                return n
        else:
            return getMappingsNode(n, nodeName)

def getMappings(rootNode):
    mappingsNode = getMappingsNode(rootNode, 'mappings')
    mapping = {}

    for prop in mappingsNode.findall('*'):
        key = ''
        val = ''

        for child in prop.findall('*'):
            if 'name' in child.tag:
                key = child.text

            if 'value' in child.tag:
                val = child.text

        if val and key:
            mapping[key] = val

    return mapping

pomFile = xml.parse('pom.xml')
root = pomFile.getroot()

mappings = getMappings(root)
print mappings

Reading Maven Pom xml in Python

Tags:

python

xml

xml-parsing

python-2.7

Welsh

2 Answers

JojOatXGME

Welsh

Recent Activity

Donate For Us

Reading Maven Pom xml in Python

Tags:

python

xml

xml-parsing

python-2.7

Welsh

2 Answers

JojOatXGME

Welsh

Related questions

Recent Activity

Donate For Us