given the following div element
<div class="info">
<a href="/s/xyz.html" class="title">title</a>
<span class="a">123</span>
<span class="b">456</span>
<span class="c">789</span>
</div>
I want to retrieve contents of the span with class "b". However, some divs I want to parse lack the second two spans (of class "b" and "c"). For these divs, I want the contents of the span with class "a". Is it possible to create a single XPath expression that selects this?
If it is not possible, is it possible to create a selector that retrieves the entire contents of the div? ie retrieves
<a href="/s/xyz.html" class="title">title</a>
<span class="a">123</span>
<span class="b">456</span>
<span class="c">789</span>
If I can do that, I can use a regex to find the data I want. (I can select the text within the div, but I'm not sure how to select the tags also. Just the text yields 123456789.)
More efficient -- requires no union:
//div/span
[@class='b'
or
@class='a'
and
not(parent::*[span[@class='b']])
]
An expression (like the one below) that is the union of two absolute "// expressions", typically performs two complete document tree traversals and then the union operation does deduplication and sorting in document order -- all this can be signifficantly less efficient than a single tree traversal, unless the XPath processor has an intelligent optimizer.
An example of such inefficient expression:
//div/span[@class='b'] | //div[not(./span[@class='b'])]/span[@class='a']
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"//div/span
[@class='b'
or
@class='a'
and
not(parent::*[span[@class='b']])
]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<div class="info">
<a href="/s/xyz.html" class="title">title</a>
<span class="a">123</span>
<span class="b">456</span>
<span class="c">789</span>
</div>
The Xpath expression is evaluated and the selected elements (in this case just one) are copied to the output:
<span class="b">456</span>
When the same transformation is applied on a different XML document, where there is no class='b':
<div class="info">
<a href="/s/xyz.html" class="title">title</a>
<span class="a">123</span>
<span class="x">456</span>
<span class="c">789</span>
</div>
the same XPath expression is evaluated and the correctly selected element is copied to the output:
<span class="a">123</span>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With