Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath - All following siblings except first specific elements

Tags:

xml

xhtml

xpath

Let's say I'm querying an xhtml document, and I want to query all of the siblings following a table with id='target'. Also, I neither want the first <table> sibling nor the first <ol> sibling of this particular element. Here's the best I could come up with:

//table[@id='target']/following-sibling::*[not(self::table[1]) and not(self::ol[1])]

However, this isn't returning any results when it should. Obviously I'm not understanding some of the syntax for this (I couldn't find a good source of information). I would certainly appreciate it if someone more experienced with XPath syntax could give me a hand. Also, for purely academic purposes, I'd be curious what the above is actually doing.

UPDATE:
See LarsH's answer for the explanation of why my XPath wasn't working, and see Dimitre's answer for the accepted solution.

like image 272
Shaun Avatar asked Nov 29 '10 04:11

Shaun


3 Answers

Use:

 /table[@id='target']/following-sibling::*[not(self::table) and not(self::ol)] 
| 
 /table[@id='target']/following-sibling::table[position() > 1]
|
 /table[@id='target']/following-sibling::ol[position() > 1]

This selects all the following siblings of the table that are not table and are not ol and all the following table siblings with position 2 or greater and all the following ol siblings with position 2 or greater.

Which is exactly what you want: all following siblings with the exception of the first table following sibling and the first ol following siblings.

This is pure XPath 1.0 and not using any XSLT functions.

like image 167
Dimitre Novatchev Avatar answered Nov 20 '22 06:11

Dimitre Novatchev


Answering the second question first: what the above is doing is selecting all following siblings that are neither table nor ol elements.

Here's why: self::table[1] selects the context node's self (iff it passes the table element name test) and filters to select only the first node along the self:: axis. There is at most one node on the self:: axis passing the element name test, so the [1] is redundant. self::table[1] selects the context node whenever it is a table element, regardless of its position among its siblings. So not(self::table[1]) returns false whenever the context node is a table element, regardless of its position among siblings.

Similarly for self::ol[1].

How to do what you're trying to do: @John Kugelman's answer is almost correct, but misses the fact that we must ignore sibling elements before and including table[@id='target']. I don't think it's possible to do correctly in pure XPath 1.0. Do you have the possibility to use XPath 2.0? If you're working in a browser, the answer is generally no.

Some workarounds would be:

  • Skip the first following table sibling and the first following ol sibling by filtering on some other basis, such as their attributes;
  • Select //table[@id='target'] as a nodeset, return it to the host environment (i.e. outside of XPath, e.g. in JavaScript), loop through that nodeset; inside the loop: select following-sibling::* via XPath, iterate through that outside of XPath, and test each result (outside of XPath) to see if it is the first table or ol.
  • Select //table[@id='target'] as a nodeset, return it to the host environment (i.e. outside of XPath, e.g. in JavaScript), loop through that nodeset; inside the loop: select generate-id(following-sibling::table[1]) and generate-id(following-sibling::ol[1]) via XPath, receive those values into JS variables t1id and o1id, and construct a string for the XPath expression using the form 'following-sibling::*[generate-id() != ' + t1id + ' and generate-id() != ' + o1id + ']'. Select that string in XPath and you have your answer! :-p

Update: A solution is possible in XSLT 1.0 - see @Dimitre's.

like image 44
LarsH Avatar answered Nov 20 '22 05:11

LarsH


There's only going to be one node when you use the self:: axis, so I believe self::*[1] will always be true. Every node is going to be the first (and only) node on its own self:: axis. This means your bracketed expression is equivalent to [not(self::table) and not(self::ol)], meaning all the tables and lists get filtered out.

I don't have a test environment set up, but off the top of my head this might do better:

/table[@id='target']/following-sibling::*
    [not(self::table and not(preceding-sibling::table)) and
     not(self::ol    and not(preceding-sibling::ol))]

It'll need some tweaking, but the idea is to filter out tables that do not have preceding-sibling tables, and ols that do not have preceding-sibling ols.

like image 34
John Kugelman Avatar answered Nov 20 '22 06:11

John Kugelman