Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I formulate this xpath expression?

Tags:

xpath

given the following div element

<div class="info">
    <a href="/s/xyz.html" class="title">title</a>
    <span class="a">123</span>
    <span class="b">456</span>
    <span class="c">789</span>
</div>

I want to retrieve contents of the span with class "b". However, some divs I want to parse lack the second two spans (of class "b" and "c"). For these divs, I want the contents of the span with class "a". Is it possible to create a single XPath expression that selects this?

If it is not possible, is it possible to create a selector that retrieves the entire contents of the div? ie retrieves

<a href="/s/xyz.html" class="title">title</a>
<span class="a">123</span>
<span class="b">456</span>
<span class="c">789</span>

If I can do that, I can use a regex to find the data I want. (I can select the text within the div, but I'm not sure how to select the tags also. Just the text yields 123456789.)

like image 624
jela Avatar asked Mar 05 '26 00:03

jela


1 Answers

More efficient -- requires no union:

   //div/span
          [@class='b'
           or
             @class='a'
            and
             not(parent::*[span[@class='b']])
           ]

An expression (like the one below) that is the union of two absolute "// expressions", typically performs two complete document tree traversals and then the union operation does deduplication and sorting in document order -- all this can be signifficantly less efficient than a single tree traversal, unless the XPath processor has an intelligent optimizer.

An example of such inefficient expression:

//div/span[@class='b'] | //div[not(./span[@class='b'])]/span[@class='a'] 

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "//div/span
          [@class='b'
           or
             @class='a'
            and
             not(parent::*[span[@class='b']])
           ]"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<div class="info">
    <a href="/s/xyz.html" class="title">title</a>
    <span class="a">123</span>
    <span class="b">456</span>
    <span class="c">789</span>
</div>

The Xpath expression is evaluated and the selected elements (in this case just one) are copied to the output:

<span class="b">456</span>

When the same transformation is applied on a different XML document, where there is no class='b':

<div class="info">
    <a href="/s/xyz.html" class="title">title</a>
    <span class="a">123</span>
    <span class="x">456</span>
    <span class="c">789</span>
</div>

the same XPath expression is evaluated and the correctly selected element is copied to the output:

<span class="a">123</span>
like image 115
Dimitre Novatchev Avatar answered Mar 08 '26 22:03

Dimitre Novatchev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!