I'm using Scrapy library to crawl a webpage.
But I have a problem. I do not know how to target data
attribute.
I have an link with data
attribute and href
as follows:
<a data-item-name="detail-page-link" href="this-is-some-link">
What I want is the value of href
. If a
had class I could do it as follows:
response.css('.some-class::attr(href)')
But the problem is that I do not know how to target data-item-name
attribute.
Any advice?
While working with Scrapy, one needs to create scrapy project. In Scrapy, always try to create one spider which helps to fetch data, so to create one, move to spider folder and create one python file over there. Create one spider with name gfgfetch.py python file. Move to the spider folder and create gfgfetch.py .
Scrapy is written with Twisted, a popular event-driven networking framework for Python. Thus, it's implemented using a non-blocking (aka asynchronous) code for concurrency.So if we want to scrape the dynamic website we have to use selenium driver or other webdriver.
Using scrapy css
selector, you can do :
response.css('a[data-item-name="detail-page-link"]::attr(href)').extract()
I'm not sure, if you can do this with the css
method, but with the xpath
method you should be able to do:
response.xpath("//a[@data-item-name]/@href")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With