Having markup like this:
<div class="foo">
<div><span class="a1"></span><a href="...">...</a></div>
<div><span class="a2"></span><a href="...">...</a></div>
<div><span class="a1"></span>some text</div>
<div><span class="a3"></span>some text</div>
</div>
I am interested in getting all <a>
and some text
ONLY if adjacent span
is of class a1
. So at the end of the whole code my result should be <a>
from first div
and some text
from third one. It'd be easy if <a>
and some text
were inside span
or div
would have class
attribute, but no luck.
What I am doing now is look for span
with a1
class:
//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]
then I get its parent and do another query()
with that parent as context node. This simply looks far from being efficient so the question clearly is if there is any better way to accomplish my goal?
THE ANSWER ADDENDUM
As per @MarcB accepted answer, the right query to use is:
//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/..
but for <a>
it may be better to use:
//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/../a
the get the <a>
instead of its container.
The nice thing about xpath queries is that you can essentially treat them like a file system path, so simply having
//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/..
^^
will find all your .a1 nodes that are below a .foo node, then move up one level to the a1 nodes' parents.
An expression that is better than using reverse axis:
//div[contains(@class,'foo')]/div[span[contains(@class,'a1')]]
This selects any div
that is a child of a div
whose class
attribute contains the string "foo" and that (the selected div
) has a span
child whose class
attribute contains the string "a1".
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"//div[contains(@class,'foo')]
/div[span[contains(@class,'a1')]]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<div class="foo">
<div><span class="a1"></span><a href="...">...</a></div>
<div><span class="a2"></span><a href="...">...</a></div>
<div><span class="a1"></span>some text</div>
<div><span class="a3"></span>some text</div>
</div>
the XPath expression is evaluated and the selected elements are copied to the output:
<div>
<span class="a1"/>
<a href="...">...</a>
</div>
<div>
<span class="a1"/>some text</div>
II. Remarks on accessing an Html element by one of its classes:
If it is known that the element can have only one class, then it isn't necessary at all to use contains()
Don't use:
//div[contains(@class, 'foo')]
Use:
//div[@class = 'foo']
or, if there could be leading/trailing spaces, use:
//div[normalize-space(@class) = 'foo']
A crucial issue with:
//div[contains(@class, 'foo')]
is that this selects any div
with class such as "myfoo", "foo2" or "myfoo3".
If the element may have more than one class, and to avoid the above issue, the correct XPath expression is:
//div[contains(concat(' ', @class, ' '), ' foo ')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With