Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xpath to select only direct siblings with matching attributes

Tags:

python

xpath

I have the following example document:

<root>
  <p class="b">A</p>
  <p class="b">B</p>
  <p class="a">C</p>
  <p class="a">D</p>
  <p class="b">E</p>
  <x>
    <p class="b">F</p>
  </x>
</root>

I am looking for an xpath expression which selects all direct siblings of a given node with matching class attributes, not any sibling. In above example, the first two <p class="b"> A-B should be selected; likewise the two <p class="a"> C-D, likewise the fifth single <p class="b"> E as it has no direct siblings; likewise the single <p class="b"> F inside of <x>. Note that in this context B and C are not direct siblings because they have different class attribute valued!

What I have is this:

xml.xpath("//p") # This selects all six <p> elements.
xml.xpath("//p[@class='b']") # This selects all four <p class="b"> elements.
xml.xpath("//p/following-sibling::p[@class='b']") # This selects all <p class="b"> sibling elements, even though not direct siblings.

The last expression selects the fifth sibling as well, although there are non-matching siblings inbetween.

How do I select only direct siblings with the same class value?

Edit To clarify: note how the last two are individual selections, not siblings!

Edit I have saved an example here. The Xpath expression based on /root/p[1] is supposed to select A, B, C, D.

like image 203
Jens Avatar asked Oct 18 '13 19:10

Jens


2 Answers

To get the very next sibling, you can add the position - 1 meaning right beside.

following-sibling::*[1]

To ensure that the next sibling is of a specific node type, you can add the following filter, where p is the node type we want to match.

[self::p]

If you only want ones with the same attribute, you would also need to specify the attribute on the first p element.

So if you just want class b p elements that are immediately after a class b p element, you can do the following. This would just give you the second p element.

//p[@class='b']/following-sibling::*[1][@class='b'][self:p]

It sounds like you might actually want any class b element which is adjacent to another class b element. In that case, you can check the following and preceding sibling. The following would give you the first 2 p elements.

//p[@class='b'][following-sibling::*[1][@class='b'][self::p] 
                or preceding-sibling::*[1][@class='b'][self::p]]    
like image 196
Justin Ko Avatar answered Nov 18 '22 12:11

Justin Ko


How about something like this:

//p[@class='b']/following-sibling::p[following-sibling::p[@class='a'] and @class='b']

It returns all following siblings that are @class='b' and them self have following siblings with @class='a'. Though it would not work for last <p> as it does not have following siblings.

like image 21
rokras Avatar answered Nov 18 '22 13:11

rokras