I'd like to know if there's any method to view headers that are being sent when scrapy issues a POST/GET request within live logs, shell or by any other similar means. Thanks!
You need to set the user agent which Scrapy allows you to do directly. import scrapy class QuotesSpider(scrapy. Spider): # ... user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.
Request headers contain more information about the resource to be fetched, or about the client requesting the resource. Response headers hold additional information about the response, like its location or about the server providing it.
You can use the FormRequest. from_response() method for this job. Here's an example spider which uses it: import scrapy def authentication_failed(response): # TODO: Check the contents of the response and return True if it failed # or False if it succeeded.
Both Response
and Request
objects will have their headers available via the .headers
attribute.
Headers for both objects are modified via the Middleware between the Downloader and the Engine (see Scrapy Architecture). If you create a new Request
object, it won't have any headers until it's passed through the Middleware that assigns them.
To view the request object, as it will be sent out, you will need to create a Middleware, put it closer to the Downloader than any other header-altering Middleware, and check the request.headers
attribute at that time.
Alternatively, you can check out the headers of the request which returned a Response
to the Spider by viewing the response.request.headers
. This may not be the Request
object you sent out though, but the one that the resulted in the Response
object that was returned (for instance, redirects/retries result in the originally dispatched Request
object being different than the Request
object in response.request
). Of course, this requires a Response
object to have been returned to the spider, so it won't work for any Request
that didn't generate a response (eg. DNS lookup error), or any Response
that gets ignored or dropped via Middleware (eg. HTTP Status 400).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With