I'm using Scrapy library to crawl a webpage.
But I have a problem. I do not know how to target data attribute.
I have an link with data attribute and href as follows:
<a data-item-name="detail-page-link" href="this-is-some-link">
What I want is the value of href. If a had class I could do it as follows:
response.css('.some-class::attr(href)')
But the problem is that I do not know how to target data-item-name attribute.
Any advice?
While working with Scrapy, one needs to create scrapy project. In Scrapy, always try to create one spider which helps to fetch data, so to create one, move to spider folder and create one python file over there. Create one spider with name gfgfetch.py python file. Move to the spider folder and create gfgfetch.py .
Scrapy is written with Twisted, a popular event-driven networking framework for Python. Thus, it's implemented using a non-blocking (aka asynchronous) code for concurrency.So if we want to scrape the dynamic website we have to use selenium driver or other webdriver.
Using scrapy css selector, you can do :
response.css('a[data-item-name="detail-page-link"]::attr(href)').extract()
I'm not sure, if you can do this with the css method, but with the xpath method you should be able to do:
response.xpath("//a[@data-item-name]/@href")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With