I am trying to process a large number of xml files (maven poms) using xmllint --xpath
. With some trial and error I figured out that it does not work as expected due to the bad default namespace declaration in these files, which is as follows:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
A simple command fails as follows:
$ echo $(xmllint --xpath '/project/modelVersion/text()' pom.xml )
XPath set is empty
If I get rid of the xmlns attribute, replacing the root element as follows:
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
The previous command gives the expected output:
$ echo $(xmllint --xpath '/project/modelVersion/text()' pom.xml )
4.0.0
Changing hundreds of pom files is not an option, especially since maven itself does not complain.
Is there a way for the xmllint
to process the file with the bad xmlns
?
UPDATE
Thanks to Damien I was able to make some progress:
$ ( echo setns x=http://maven.apache.org/POM/4.0.0; echo 'xpath /x:project/x:modelVersion/text()'; ) | xmllint --shell pom.xml
/ > setns x=http://maven.apache.org/POM/4.0.0
/ > xpath /x:project/x:modelVersion/text()
Object is a Node Set :
Set contains 1 nodes:
1 TEXT
content=4.0.0
But this does not quite do what I need. My follow up questions are as follows:
Is there a way to print only the text? I would like the output to contain on 4.0.0
in the above example
It seems the output gets truncated after about 30 characters. Is it possible to get complete output? This does not happen with xmllint --xpath
strip the namespace with sed
given in pom.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
</project>
this:
cat pom.xml | sed '2 s/xmlns=".*"//g' | xmllint --xpath '/project/modelVersion' -
returns this:
<modelVersion>4.0.0</modelVersion>
if you have funky formatting (like, the xmlns attributes are on their own lines), run it through the formatter first:
cat pom.xml | xmllint --format - | sed '2 s/xmlns=".*"//g' | xmllint --xpath '/project/modelVersion' -
xmllint --xpath "/*[local-name() = 'project']/*[local-name() = 'parent']/*[local-name() = 'version']/text()" pom.xml
For a top level pom.xml:
xmllint --xpath "/*[local-name() = 'project']/*[local-name() = 'version']/text()" pom.xml
It ain't real pretty, but it avoids formatting assumptions and/or re-formatting the input pom.xml file.
If you need to strip off the "-SNAPSHOT" for some reason, pipe the result of the above through | sed -e "s|-SNAPSHOT||"
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With