Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XQuery: // vs descendant-or-self::node()

Recently I needed to evaluate an XQuery on the Node of an HTML document. Basically, I needed to select all elements with an href attribute from the first child of the body element. I've added a slight example to explain:

<html>
    <body>
        <a href="http://www.google.be"/>
    </body>
</html>

The desired extraction result is in this case obviously:

<a href="http://www.google.be"/>

My first idea was to use //body/*[1]//*[@href] because:

  • //body matches the body element, wherever it is
  • /*[1] matches the first child of the body element
  • //*[@href] matches all descendants or self of the current element

I figured that would work but on the example provided, the XQuery gives no results.

However, I read up a bit and found the following (source: http://www.keller.com/xslt/8/):

Alternate notation for "//": descendant-or-self::node()

So I changed my XQuery to //body/*[1]/descendant-or-self::node()[@href] and this time, the results were correct.

My question: what is the difference between // and descendant-or-self::node()? What I found here (What's the difference between //node and /descendant::node in xpath?) and here (http://www.w3.org/TR/xpath/#axes) says:

// is short for /descendant-or-self::node()/. For example, //para is short for /descendant-or-self::node()/child::para.

Which leads me to conclude that // and /descendant-or-self::node() are not interchangeable (probably because of the terminating / at the end?), but then can someone tell me if there is a shorthand for /descendant-or-self::node()?

like image 976
RDM Avatar asked Jan 20 '14 17:01

RDM


2 Answers

Your first XPath expression (//body/*[1]//*[@href]) actually represents what you described in natrual language: //body/*[1] is the first child of the body element, and //*[@href] selects the first element (below) having an @href attribute.

In your example, there is no element below the anchor tag having such an attribute. Fore xample, this query would match

<html>
    <body>
        <p>
            <a href="http://www.google.be"/>
        </p>
    </body>
</html>

The non-abbreviated version of this query is:

//body/*[1]/descendant-or-self::node()/*[@href]

Putting your second query in contrast, the problem should be easy to see:

//body/*[1]/descendant-or-self::node()[@href]
like image 162
Jens Erat Avatar answered Sep 28 '22 07:09

Jens Erat


I think the problem is in your description, it does not appear to match your example!

Given the input:

<html>
    <body>
        <a href="http://www.google.be"/>
    </body>
</html>

and the requirements statement:

"all elements with an href attribute from the first child of the body element"

Your XPath formulation of:

//body/*[1]//*[@href]

matches your requirements statement. But, the expected output would be an empty sequence, exactly as you have found... and NOT the output you suggested:

<a href="http://www.google.be"/>

To get the suggested output, your XPath requirements statement would instead perhaps be:

"the first child of the body element with an href attribute", which would lead to the XPath:

//*[@href][parent::body][1]

From your requirements statement and the mismatched example, it is hard to be sure exactly what you meant. So perhaps your requirements statement is:

"the first element in the body with a href attribute"

If that is the case, then I would suggest the XPath:

($input//*[@href][ancestor::body])[1]

Note that the sequence constructor, i.e. the '(' and ')' flattens the descendant sequence(s) to allow you to address each selected descendant in a manner similar to an array.

like image 20
adamretter Avatar answered Sep 28 '22 07:09

adamretter