I played with different XPath queries with XPather (works only with older firefox versions) and notice a difference between the results from the following queries
This one shows some results
//div[descendant::table/descendant::td[4]]
This one lists empty list
//div[//table//td[4]]
Are they different due to some rules or it's just misbehavior of particular implementation of XPath interpreter? (Seems like used from FF engine, XPather is just an excellent simple GUI for querying)
With XPath 1.0 //
is an abbreviation for /descendant-or-self::node()/
so your first path is /descendant-or-self::node()/div[descendant::table/descendant::td[4]]
while the second is rather different with /descendant-or-self::node()/div[/descendant-or-self::node()/table/descendant-or-self::node()/td[4]]
. So the major difference is that inside your first predicate you look down for descendants relative to the div
element while in the second predicate you look down for descendants from the root node /
(also called the document node).
You might want //div[.//table//td[4]]
for the second path expression to come closer to the first one.
[edit] Here is a sample:
<html>
<body>
<div>
<table>
<tbody>
<tr>
<td>1</td>
</tr>
<tr>
<td>2</td>
</tr>
<tr>
<td>3</td>
</tr>
<tr>
<td>4</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
With that sample the path //div[descendant::table/descendant::td[4]]
selects the div
element as it has a table
child which has a fourth td
descendant.
However with //div[.//table//td[4]]
we look for //div[./descendant-or-self::node()/table/descendant-or-self::node()/td[4]]
which is short for //div[./descendant-or-self::node()/table/descendant-or-self::node()/child::td[4]]
and there is no element having a fourth td
child element.
I hope that explains the difference, if you use //div[.//table/descendant::td[4]]
then you should get the same result as with your original form.
There's an important note in W3C document on XPath 1.0 (W3C Recommendation 16 November 1999):
XML Path Language (XPath) Version 1.0
2 Location Paths
2.5 Abbreviated SyntaxNOTE: The location path
//para[1]
does not mean the same as the location path/descendant::para[1]
. The latter selects the first descendantpara
element; the former selects all descendantpara
elements that are the firstpara
children of their parents.
Simlar note in the document on XPath 3.1 (W3C Recommendation 21 March 2017)
XML Path Language (XPath) 3.1
3 Expressions
3.3 Path Expressions
3.3.5 Abbreviated SyntaxNOTE: The path expression
//para[1]
does not mean the same as the path expression/descendant::para[1]
. The latter selects the first descendantpara
element; the former selects all descendantpara
elements that are the firstpara
children of their respective parents.
That means the double slash inside the path is not just a shortcut for /descendant-or-self::node()/
but also a starting point for next level of an XML tree iteration, which implies the step expression to the right of //
is re-run on each descendant of the current context node.
So the exact meaning of the predicate in this path
//div[ descendant::table/descendant::td[4] ]
is:
<table>
nodes descendant to the current <div>
,<table>
build a sequence of all descendant <td>
elements and concatenate them into a single sequence,Finally the path returns all <div>
elements in the document, which have at least four data cells in all their nested tables. And since there are tables in the document which have 4 cells or more (including cells in nested tables, of course), the whole expression selects their respective <div>
ancestors.
On the other hand the predicate in
//div[ //table//td[4] ]
means:
<table>
elements (more precisely, test the root node and every root's descendant if it has a <table>
child),<td>
subelement (i.e. test if the table or any of its descendants has at least four <td>
children).Please note the predicate subexpression does not depend on the context node. It is a global path, resolving to some sequence of nodes (possibly empty), thus the predicate boolean value depends only on the document's structure. If it is true the whole path returns a sequence of all <div>
elements in the document, else the empty sequence.
Finally the predicate would be true iff there was an element in any table, having 4 (at least) data cells.
And as far as I can see all <tr>
rows contain two or three cells - there is no element with 4 or more <td>
children, so the predicate subexpression returns en empty sequence, the predicate is false and the whole path gets filtered out. Result is: nothing (empty sequence).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With