In Xpath i need to select <p>
nodes following the <h2>DATA</h2>
nodes until next <h2>
, so in a structure like:
<div class="box">
<h2>NO</h2>
<p>B:<span> Y</span></p>
<h2>DATA</h2>
<p>AA:<span> CONTENT</span></p>
<p>AA:<span> MORE</span></p>
<h2>NO</h2>
<p>C:<span> Z</span></p>
<h2>DATA</h2>
<p>BB:<span> CONTENT</span></p>
<p>BB:<span> MORE</span></p>
</div>
should select:
<p>AA:<span> CONTENT</span></p>
<p>AA:<span> MORE</span></p>
<p>BB:<span> CONTENT</span></p>
<p>BB:<span> MORE</span></p>
How about this?
p[preceding-sibling::h2[1][.="DATA"]]
My python test for checking the xpath I provided:
>>> from lxml import etree
>>> doc = etree.XML("""<div class="box">
... <h2>NO</h2>
... <p>B:<span> Y</span></p>
... <h2>DATA</h2>
... <p>AA:<span> CONTENT</span></p>
... <p>AA:<span> MORE</span></p>
... <h2>NO</h2>
... <p>C:<span> Z</span></p>
... <h2>DATA</h2>
... <p>BB:<span> CONTENT</span></p>
... <p>BB:<span> MORE</span></p>
... </div>""")
>>> doc.xpath('p[preceding-sibling::h2[1][.="DATA"]]')
[<Element p at 252ef70>, <Element p at 252efc8>, <Element p at 2542050>, <Element p at 25420a8>]
>>> doc.xpath('p[preceding-sibling::h2[1][.="DATA"]]/text()')
['AA:', 'AA:', 'BB:', 'BB:']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With