Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xpath select items except last item wth contain syntax

Tags:

html

dom

xpath

I want to select the following html items (action,comedy) but except the last one (tags).

To select all my follow code is working:

//*[@id="video-tags"]//a[contains(@href,'tags')]

But to select except the last one (tags), it won't work with my follow code:

//*[@id="video-tags"]//a[contains(@href,'tags') not(position() > last() -1)]

The html

<ul id="video-tags">
        <li>Uploader: </li>
        <li class="profile_name"><a href="/profiles/wilco">wilco</a></li>
        <li><em>Tagged: </em></li>
        <li><a href="/tags/action">action</a>, </li>
        <li><a href="/tags/comedy">comedy</a>, </li>
        <li>more <a href="/tags/"><strong>tags</strong></a></li>
</ul>

Thanks in advance

Nick

like image 637
directory Avatar asked Sep 10 '13 09:09

directory


1 Answers

Aside from the syntax error - you need an and, i.e. contains(@href,'tags') and not(position()...) - you're tripping up on a subtlety of how // is defined.

The XPath //a[position() < last()] will not give you every a except the last one, it will give you every a that is not the last a inside its respective parent element. Since each li contains at most one a, every a is the last a in its respective parent, so this test will match nothing at all.

You can achieve what you want by wrapping most of the expression in parentheses and putting the position check in a separate predicate

(//*[@id="video-tags"]//a[contains(@href,'tags')])[position() < last()]

The parentheses cause the final predicate to apply to the node set selected by the expression as a whole, rather than just to the a location step, i.e. it will first find all the a elements whose href contains "tags", then return all but the last of these selected elements in document order.


Technical explanation - the definition of // in XPath is that it is a shorthand for /descendant-or-self::node()/ (including the slashes), which is a location step that gives you this node and all its descendant nodes. So //a means /descendant-or-self::node()/child::a, and //a[something] means /descendant-or-self::node()/child::a[something] - the predicate applies to the child:: step, not the descendant-or-self:: one. If you want to apply a predicate to the descendant search then you should use the descendant:: axis explicitly - /descendant::a[something].

like image 167
Ian Roberts Avatar answered Sep 30 '22 00:09

Ian Roberts