how do I formulate this xpath expression?

Question

given the following div element

<div class="info">
    <a href="/s/xyz.html" class="title">title</a>
    <span class="a">123</span>
    <span class="b">456</span>
    <span class="c">789</span>
</div>

I want to retrieve contents of the span with class "b". However, some divs I want to parse lack the second two spans (of class "b" and "c"). For these divs, I want the contents of the span with class "a". Is it possible to create a single XPath expression that selects this?

If it is not possible, is it possible to create a selector that retrieves the entire contents of the div? ie retrieves

<a href="/s/xyz.html" class="title">title</a>
<span class="a">123</span>
<span class="b">456</span>
<span class="c">789</span>

If I can do that, I can use a regex to find the data I want. (I can select the text within the div, but I'm not sure how to select the tags also. Just the text yields 123456789.)

Dimitre Novatchev · Accepted Answer

More efficient -- requires no union:

   //div/span
          [@class='b'
           or
             @class='a'
            and
             not(parent::*[span[@class='b']])
           ]

An expression (like the one below) that is the union of two absolute "// expressions", typically performs two complete document tree traversals and then the union operation does deduplication and sorting in document order -- all this can be signifficantly less efficient than a single tree traversal, unless the XPath processor has an intelligent optimizer.

An example of such inefficient expression:

//div/span[@class='b'] | //div[not(./span[@class='b'])]/span[@class='a']

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "//div/span
          [@class='b'
           or
             @class='a'
            and
             not(parent::*[span[@class='b']])
           ]"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<div class="info">
    <a href="/s/xyz.html" class="title">title</a>
    <span class="a">123</span>
    <span class="b">456</span>
    <span class="c">789</span>
</div>

The Xpath expression is evaluated and the selected elements (in this case just one) are copied to the output:

<span class="b">456</span>

When the same transformation is applied on a different XML document, where there is no class='b':

<div class="info">
    <a href="/s/xyz.html" class="title">title</a>
    <span class="a">123</span>
    <span class="x">456</span>
    <span class="c">789</span>
</div>

the same XPath expression is evaluated and the correctly selected element is copied to the output:

<span class="a">123</span>

how do I formulate this xpath expression?

Tags:

xpath

jela

1 Answers

Dimitre Novatchev

Recent Activity

Donate For Us

how do I formulate this xpath expression?

Tags:

xpath

jela

1 Answers

Dimitre Novatchev

Related questions

Recent Activity

Donate For Us