Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath: What do nested square brackets mean?

I'm learning XPath for web scraping and stumbled across these two XPath examples:

//div[@class="head"][@id="top"]

and

//div[@class='canvas- graph']//a[@href='/accounting.html'][i[@class='icon-usd']]/following-sibling::h4

I wonder what does the div[@class="head"][@id="top"] mean. Does it mean that the @id=top property belongs to the div element? Is it the same as //div[@class="head" and @id="top"]?
And what does it mean when square brackets are nested inside another as in the second example? What would the HTML DOM look like for the second xpath expression to match it?

like image 672
blablaalb Avatar asked Dec 31 '22 23:12

blablaalb


2 Answers

Square brackets delimit predicates, and predicates filter items††.

You anticipate two ways in which predicates can be combined:

  1. Consecutively: Yes, this is equivalent to logically anding the predicates. So, correct, //div[@class="head"][@id="top"] is equivalent to //div[@class="head" and @id="top"].

  2. Recursively: Yes, XPath allows predicates within predicates (nesting, as you observe).

    So, a[@href='/accounting.html'][i[@class='icon-usd']] filters those a elements with an @href attribute value equal to '/accounting.html' and a child i element with a @class attribute value equal to 'icon-usd'.

Together these composition mechanisms provide a powerful means of building predicates out of more basic conditions.


Predicate references: XPath 1.0. XPath 3.1.
††Node-sets in XPath 1.0; sequences in XPath 2.0+.

like image 156
kjhughes Avatar answered Jan 14 '23 14:01

kjhughes


The square braces are called a predicate.

A predicate filters a node-set with respect to an axis to produce a new node-set. For each node in the node-set to be filtered, the PredicateExpr is evaluated with that node as the context node, with the number of nodes in the node-set as the context size, and with the proximity position of the node in the node-set with respect to the axis as the context position; if PredicateExpr evaluates to true for that node, the node is included in the new node-set; otherwise, it is not included.

A PredicateExpr is evaluated by evaluating the Expr and converting the result to a boolean. If the result is a number, the result will be converted to true if the number is equal to the context position and will be converted to false otherwise; if the result is not a number, then the result will be converted as if by a call to the boolean function. Thus a location path para[3] is equivalent to para[position()=3].

Inside of the predicate you test whether a condition is true or false as a means of filtering the set if items selected to the left of the predicate. Think of it like a SQL WHERE clause.

You can choose to put multiple test criteria within a single predicate, or you can have multiple predicates. There may be some advantage from a tuning perspective or for clarity to choose to have multiple predicates vs using and and multiple tests within a single predicate.

like image 28
Mads Hansen Avatar answered Jan 14 '23 13:01

Mads Hansen