Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a better way of getting parent node of XPath query result?

Tags:

dom

xml

xpath

Having markup like this:

<div class="foo">
   <div><span class="a1"></span><a href="...">...</a></div>
   <div><span class="a2"></span><a href="...">...</a></div>
   <div><span class="a1"></span>some text</div>
   <div><span class="a3"></span>some text</div>
</div>

I am interested in getting all <a> and some text ONLY if adjacent span is of class a1. So at the end of the whole code my result should be <a> from first div and some text from third one. It'd be easy if <a> and some text were inside span or div would have class attribute, but no luck.

What I am doing now is look for span with a1 class:

//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]

then I get its parent and do another query() with that parent as context node. This simply looks far from being efficient so the question clearly is if there is any better way to accomplish my goal?


THE ANSWER ADDENDUM

As per @MarcB accepted answer, the right query to use is:

//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/..

but for <a> it may be better to use:

//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/../a

the get the <a> instead of its container.

like image 538
Marcin Orlowski Avatar asked Sep 10 '25 08:09

Marcin Orlowski


2 Answers

The nice thing about xpath queries is that you can essentially treat them like a file system path, so simply having

//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/..
                                                              ^^

will find all your .a1 nodes that are below a .foo node, then move up one level to the a1 nodes' parents.

like image 128
Marc B Avatar answered Sep 13 '25 06:09

Marc B


An expression that is better than using reverse axis:

//div[contains(@class,'foo')]/div[span[contains(@class,'a1')]]

This selects any div that is a child of a div whose class attribute contains the string "foo" and that (the selected div) has a span child whose class attribute contains the string "a1".

XSLT - based verification:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "//div[contains(@class,'foo')]
          /div[span[contains(@class,'a1')]]"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<div class="foo">
   <div><span class="a1"></span><a href="...">...</a></div>
   <div><span class="a2"></span><a href="...">...</a></div>
   <div><span class="a1"></span>some text</div>
   <div><span class="a3"></span>some text</div>
</div>

the XPath expression is evaluated and the selected elements are copied to the output:

<div>
   <span class="a1"/>
   <a href="...">...</a>
</div>
<div>
   <span class="a1"/>some text</div>

II. Remarks on accessing an Html element by one of its classes:

If it is known that the element can have only one class, then it isn't necessary at all to use contains()

Don't use:

//div[contains(@class, 'foo')]

Use:

//div[@class = 'foo']

or, if there could be leading/trailing spaces, use:

//div[normalize-space(@class) = 'foo']

A crucial issue with:

//div[contains(@class, 'foo')]

is that this selects any div with class such as "myfoo", "foo2" or "myfoo3".

If the element may have more than one class, and to avoid the above issue, the correct XPath expression is:

//div[contains(concat(' ', @class, ' '), ' foo ')]
like image 24
Dimitre Novatchev Avatar answered Sep 13 '25 04:09

Dimitre Novatchev