Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to target data attribute with Scrapy

Tags:

python

scrapy

I'm using Scrapy library to crawl a webpage.

But I have a problem. I do not know how to target data attribute.

I have an link with data attribute and href as follows:

<a data-item-name="detail-page-link" href="this-is-some-link">

What I want is the value of href. If a had class I could do it as follows:

response.css('.some-class::attr(href)') 

But the problem is that I do not know how to target data-item-name attribute.

Any advice?

like image 335
Boky Avatar asked Jun 07 '18 07:06

Boky


People also ask

How do you scrape data from Scrapy?

While working with Scrapy, one needs to create scrapy project. In Scrapy, always try to create one spider which helps to fetch data, so to create one, move to spider folder and create one python file over there. Create one spider with name gfgfetch.py python file. Move to the spider folder and create gfgfetch.py .

Can Scrapy scrape dynamic content?

Scrapy is written with Twisted, a popular event-driven networking framework for Python. Thus, it's implemented using a non-blocking (aka asynchronous) code for concurrency.So if we want to scrape the dynamic website we have to use selenium driver or other webdriver.


2 Answers

Using scrapy css selector, you can do :

response.css('a[data-item-name="detail-page-link"]::attr(href)').extract() 
like image 50
Sijan Bhandari Avatar answered Oct 05 '22 13:10

Sijan Bhandari


I'm not sure, if you can do this with the css method, but with the xpath method you should be able to do:

response.xpath("//a[@data-item-name]/@href")
like image 20
xystum Avatar answered Oct 05 '22 13:10

xystum