Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Force xmllint to ignore bad default xmlns

Tags:

xmllint

I am trying to process a large number of xml files (maven poms) using xmllint --xpath. With some trial and error I figured out that it does not work as expected due to the bad default namespace declaration in these files, which is as follows:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

A simple command fails as follows:

$ echo $(xmllint --xpath '/project/modelVersion/text()' pom.xml )
XPath set is empty

If I get rid of the xmlns attribute, replacing the root element as follows:

<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

The previous command gives the expected output:

$ echo $(xmllint --xpath '/project/modelVersion/text()' pom.xml )
4.0.0

Changing hundreds of pom files is not an option, especially since maven itself does not complain.

Is there a way for the xmllint to process the file with the bad xmlns?

UPDATE

Thanks to Damien I was able to make some progress:

$ ( echo setns x=http://maven.apache.org/POM/4.0.0; echo 'xpath /x:project/x:modelVersion/text()'; ) | xmllint --shell pom.xml
/ > setns x=http://maven.apache.org/POM/4.0.0
/ > xpath /x:project/x:modelVersion/text()
Object is a Node Set :
Set contains 1 nodes:
1  TEXT
    content=4.0.0

But this does not quite do what I need. My follow up questions are as follows:

  1. Is there a way to print only the text? I would like the output to contain on 4.0.0 in the above example

  2. It seems the output gets truncated after about 30 characters. Is it possible to get complete output? This does not happen with xmllint --xpath

like image 778
Miserable Variable Avatar asked Feb 12 '15 09:02

Miserable Variable


2 Answers

strip the namespace with sed

given in pom.xml:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
</project>

this:

cat pom.xml | sed '2 s/xmlns=".*"//g' | xmllint --xpath '/project/modelVersion' -

returns this:

<modelVersion>4.0.0</modelVersion>

if you have funky formatting (like, the xmlns attributes are on their own lines), run it through the formatter first:

cat pom.xml | xmllint --format - | sed '2 s/xmlns=".*"//g' | xmllint --xpath '/project/modelVersion' -
like image 77
djeikyb Avatar answered Oct 24 '22 09:10

djeikyb


xmllint --xpath "/*[local-name() = 'project']/*[local-name() = 'parent']/*[local-name() = 'version']/text()" pom.xml

For a top level pom.xml:

xmllint --xpath "/*[local-name() = 'project']/*[local-name() = 'version']/text()" pom.xml

It ain't real pretty, but it avoids formatting assumptions and/or re-formatting the input pom.xml file.

If you need to strip off the "-SNAPSHOT" for some reason, pipe the result of the above through | sed -e "s|-SNAPSHOT||".

like image 30
Charlie Reitzel Avatar answered Oct 24 '22 09:10

Charlie Reitzel