Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select next node using scrapy

I have html looks like this:

<h1>Text 1</h1>
<div>Some info</div>
<h1>Text 2</h1>
<div>...</div>

I understand how to extract using scrapy information from h1:

content.select("//h1[contains(text(),'Text 1')]/text()").extract()

But my goal is to extract content from <div>Some info</div>

My problem is that I don't have any specific information about div. All what I know, that it goes exactly after <h1>Text 1</h1>. Can I, using selectors, get NEXT element in tree? Element, that situated on the same level in DOM tree?

Something like:

a = content.select("//h1[contains(text(),'Text 1')]/text()")
a.next("//div/text()").extract()
Some info
like image 764
SkyFox Avatar asked Nov 04 '13 12:11

SkyFox


People also ask

How do you write XPath for Scrapy?

When you are using text nodes in a XPath string function, then use . (dot) instead of using .//text(), because this produces the collection of text elements called as node-set.

How do I extract text from Scrapy?

Description. /html/head/title − This will select the <title> element, inside the <head> element of an HTML document. /html/head/title/text() − This will select the text within the same <title> element. //td − This will select all the elements from <td>.


1 Answers

Try this xpath:

//h1[contains(text(), 'Text 1')]/following-sibling::div[1]/text()
like image 135
kev Avatar answered Sep 17 '22 12:09

kev