Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xpath: select following until node

Tags:

xpath

In Xpath i need to select <p> nodes following the <h2>DATA</h2> nodes until next <h2>, so in a structure like:

<div class="box">
    <h2>NO</h2>
    <p>B:<span> Y</span></p>
    <h2>DATA</h2>
    <p>AA:<span> CONTENT</span></p>
    <p>AA:<span> MORE</span></p>
    <h2>NO</h2>
    <p>C:<span> Z</span></p>
    <h2>DATA</h2>
    <p>BB:<span> CONTENT</span></p>
    <p>BB:<span> MORE</span></p>
</div>

should select:

    <p>AA:<span> CONTENT</span></p>
    <p>AA:<span> MORE</span></p>
    <p>BB:<span> CONTENT</span></p>
    <p>BB:<span> MORE</span></p>
like image 806
TMichel Avatar asked Dec 28 '22 01:12

TMichel


1 Answers

How about this?

p[preceding-sibling::h2[1][.="DATA"]]

My python test for checking the xpath I provided:

>>> from lxml import etree
>>> doc = etree.XML("""<div class="box">
...     <h2>NO</h2>
...     <p>B:<span> Y</span></p>
...     <h2>DATA</h2>
...     <p>AA:<span> CONTENT</span></p>
...     <p>AA:<span> MORE</span></p>
...     <h2>NO</h2>
...     <p>C:<span> Z</span></p>
...     <h2>DATA</h2>
...     <p>BB:<span> CONTENT</span></p>
...     <p>BB:<span> MORE</span></p>
... </div>""")
>>> doc.xpath('p[preceding-sibling::h2[1][.="DATA"]]')
[<Element p at 252ef70>, <Element p at 252efc8>, <Element p at 2542050>, <Element p at 25420a8>]
>>> doc.xpath('p[preceding-sibling::h2[1][.="DATA"]]/text()')
['AA:', 'AA:', 'BB:', 'BB:']
like image 100
MattH Avatar answered Jan 01 '23 10:01

MattH