Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xpath multiple tag select

For the given XML how can I select c,d,g,h (which will be child tags of b not in j) using xpath?

XML

<a>
 <b>
  <c>select me</c>
  <d>select me</d>
  <e>do not select me</e>
  <f>
    <g>select me</g>
    <h>select me</h>
  </f>
 </b>

 <j>
  <c>select me</c>
  <d>select me</d>
  <e>do not select me</e>
  <f>
    <g>select me</g>
    <h>select me</h>
  </f>
 </j>
</a>

I thought of using following to grab the result but it doesn't give me g,h values

xpath.compile("//a/b/*[self::c or self::d or self::f/text()");

java code I used

import org.w3c.dom.*;
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import java.io.IOException;
import org.xml.sax.SAXException;

 public class XPathDemo {

   public static void main(String[] args) 
   throws ParserConfigurationException,SAXException,IOException,PathExpressionException {

   DocumentBuilderFactory domFactory = 
   DocumentBuilderFactory.newInstance();
   domFactory.setNamespaceAware(true); 
   DocumentBuilder builder = domFactory.newDocumentBuilder();
   Document doc = builder.parse("test.xml");
   XPath xpath = XPathFactory.newInstance().newXPath();

   XPathExpression expr = xpath.compile("//a/b/*[self::c or self::d or self::f]/text()");

  Object result = expr.evaluate(doc, XPathConstants.NODESET);
  NodeList nodes = (NodeList) result;
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i).getNodeValue()); 
   }
}

}

Can anyone help me with this?

Thanks a lot!!!

like image 420
Pavithra Gunasekara Avatar asked Dec 05 '22 21:12

Pavithra Gunasekara


2 Answers

Use this xpath if you want to select all c, d, g, h nodes:

"//c|//d|//g|//h"

Use this, if you want to specify the full path from the root:

"/a/b/c|/a/b/d|/a/b/f/g|/a/b/f/h"

Or if you want all c, d, g or h, which are within b:

"//b//c|//b//d|//b//g|//b//h"

Also, in your code: use nodes.item(i).getTextContent() instead of GetNodeValue.

like image 154
Petar Ivanov Avatar answered Dec 25 '22 00:12

Petar Ivanov


Use:

 //a/b/*[not(self::e or self::f)]
|
 //a/b/*/*[self::g or self::h]

In case you know the structure of the XML document well and it is true that the only grand-children that //a/b can have are g and/or h, then this can be simplified to:

 //a/b/*[not(self::e or self::f)]
|
 //a/b/*/*

In XPath 2.0 this can be written even simpler as:

 //a/b/(*[not(self::e or self::f)] | */*)
like image 36
Dimitre Novatchev Avatar answered Dec 25 '22 01:12

Dimitre Novatchev