Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Native shell command set to extract node value from XML

Tags:

xml

xmllint

I'm trying to extract the value of a node from a pom.xml:

<?xml version="1.0" encoding="UTF-8"?> <project>     <parent>         <groupId>org.me.labs</groupId>         <artifactId>my-random-project</artifactId>         <version>1.5.0</version>     </parent>     ... </project> 

I need to extract the artifactId and version from the XML using a shell command. I have the following requirements/observations:

  1. The shell script will be done within a build assembly file we use at work, so the smaller the script the better.
  2. Since it'll be used on multiple systems (usually RHEL5), I'm looking for something that can run natively on default images.
  3. Tags like can occur elsewhere in the pom, so I can't simply awk for those tags.

I have tried the following:

  1. xpath works on my Mac, but isn't available by default on RHEL machines. Similarly for xmllint --xpath, which I guess is only available on later versions of xmllint, which I don't have and can't enforce.
  2. xmllint --pattern seemed promising, but I can't seem to get an output out of xmllint --pattern '//project/parent/version' pom.xml (prints entire XML) or xmllint --stream --pattern '//project/parent/version' pom.xml (no output).

I realize this is a common question here on SO, but the points above are why I can't use those answers. TIA for your help.

like image 554
Karthik V Avatar asked Jun 06 '13 10:06

Karthik V


2 Answers

--format is used only to format (indent, etc) the document. You can do that using --xpath (tested in Ubuntu, libxml v20900):

$ xmllint --xpath "//project/parent/version/text()" pom.xml 1.5.0 
like image 80
Salem Avatar answered Sep 22 '22 16:09

Salem


I've managed to solve it for the time being with this rather unwiedly script using xmllint --shell.

echo "cat //project/parent/version" | xmllint --shell pom.xml | sed '/^\/ >/d' | sed 's/<[^>]*.//g' 

If the XML nodes have namespace attributes like my pom.xml had, things get heavier, basically extracting the node by name:

echo "cat //*[local-name()='project']/*[local-name()='parent']/*[local-name()='version']" | xmllint --shell pom.xml | sed '/^\/ >/d' | sed 's/<[^>]*.//g' 

Hope it helps. If anyone can simply these expressions, I'd be grateful.

like image 33
Karthik V Avatar answered Sep 19 '22 16:09

Karthik V