Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the number of <p> tags inside div in scrapy?

I am scraping this website link.

The last of the <p> tags contains the user_info and it is creating problem for me as I am using -

''.join(response.xpath('//div[@class="entry-content"]/p[2]/text()').extract())

But p[2] changes if the text above it is in good numbers. Say here it is p[5]

I am thinking of this to calculate the number of <p>tags inside the div and assign the number to my item

How to deal with this problem?

like image 446
Nikhil Parmar Avatar asked Sep 29 '15 05:09

Nikhil Parmar


2 Answers

From what I understand, this is just the last paragraph in the entry content - you can use last():

//div[@class="entry-content"]/p[last()]/text()

Works for me.

like image 51
alecxe Avatar answered Sep 18 '22 23:09

alecxe


If you just want to count the p elements

len(response.xpath('//div[@class="entry-content"]/p'))
like image 20
saeedgnu Avatar answered Sep 17 '22 23:09

saeedgnu