I know that you can scrapy shell -s USER_AGENT='custom user agent' 'http://www.example.com'
to change the USER_AGENT
, but how do you add request headers?
You need to set the user agent which Scrapy allows you to do directly. import scrapy class QuotesSpider(scrapy. Spider): # ... user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.
Scrapy can crawl websites using the Request and Response objects. The request objects pass over the system, uses the spiders to execute the request and get back to the request when it returns a response object.
Making a request is a straightforward process in Scrapy. To generate a request, you need the URL of the webpage from which you want to extract useful data. You also need a callback function. The callback function is invoked when there is a response to the request.
there is no current way to add headers directly on cli, but you could do something like:
$ scrapy shell
...
...
>>> from scrapy import Request
>>> req = Request('yoururl.com', headers={"header1":"value1"})
>>> fetch(req)
This will update the current shell information with that new request.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With