Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the text from child nodes if it is parents to other node in Scrapy using XPath

I am facing a problem where I have to get the result from the child node which may or may not be parents to some other node using Xpath in scrapy. consider the case like

<h1 class="main">
 <span class="child">data</span>
</h1>

or

<h1 class="main">
<span class="child">
 <span class="child2">data</span>
</span>
</h1>

My solution was response.xpath(".//h1[@class='main']/span/text()").extract()

like image 969
Pramod Avatar asked Sep 29 '22 12:09

Pramod


2 Answers

use //text, and it will return all text elements in a list from within your span, both parent and child:

response.xpath(".//h1[@class='main']/span//text()").extract()
like image 174
Anzel Avatar answered Oct 02 '22 16:10

Anzel


You can use:

  • response.xpath("string(.//h1[@class='main']/span)").extract()
  • or even response.xpath("string(.//h1[@class='main'])").extract() if you're after the whole header text
like image 41
paul trmbrth Avatar answered Oct 02 '22 14:10

paul trmbrth