I have html looks like this:
<h1>Text 1</h1>
<div>Some info</div>
<h1>Text 2</h1>
<div>...</div>
I understand how to extract using scrapy information from h1:
content.select("//h1[contains(text(),'Text 1')]/text()").extract()
But my goal is to extract content from <div>Some info</div>
My problem is that I don't have any specific information about div. All what I know, that it goes exactly after <h1>Text 1</h1>
. Can I, using selectors, get NEXT element in tree? Element, that situated on the same level in DOM tree?
Something like:
a = content.select("//h1[contains(text(),'Text 1')]/text()")
a.next("//div/text()").extract()
Some info
When you are using text nodes in a XPath string function, then use . (dot) instead of using .//text(), because this produces the collection of text elements called as node-set.
Description. /html/head/title − This will select the <title> element, inside the <head> element of an HTML document. /html/head/title/text() − This will select the text within the same <title> element. //td − This will select all the elements from <td>.
Try this xpath
:
//h1[contains(text(), 'Text 1')]/following-sibling::div[1]/text()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With