Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I view/get scrapy POST/GET request headers

I'd like to know if there's any method to view headers that are being sent when scrapy issues a POST/GET request within live logs, shell or by any other similar means. Thanks!

like image 915
C4t4 Avatar asked Nov 03 '15 19:11

C4t4


People also ask

How do you set a header in Scrapy?

You need to set the user agent which Scrapy allows you to do directly. import scrapy class QuotesSpider(scrapy. Spider): # ... user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.

What does a request header contains?

Request headers contain more information about the resource to be fetched, or about the client requesting the resource. Response headers hold additional information about the response, like its location or about the server providing it.

How do you get a response from Scrapy request?

You can use the FormRequest. from_response() method for this job. Here's an example spider which uses it: import scrapy def authentication_failed(response): # TODO: Check the contents of the response and return True if it failed # or False if it succeeded.


1 Answers

Both Response and Request objects will have their headers available via the .headers attribute.

Headers for both objects are modified via the Middleware between the Downloader and the Engine (see Scrapy Architecture). If you create a new Request object, it won't have any headers until it's passed through the Middleware that assigns them.

To view the request object, as it will be sent out, you will need to create a Middleware, put it closer to the Downloader than any other header-altering Middleware, and check the request.headers attribute at that time.

Alternatively, you can check out the headers of the request which returned a Response to the Spider by viewing the response.request.headers. This may not be the Request object you sent out though, but the one that the resulted in the Response object that was returned (for instance, redirects/retries result in the originally dispatched Request object being different than the Request object in response.request). Of course, this requires a Response object to have been returned to the spider, so it won't work for any Request that didn't generate a response (eg. DNS lookup error), or any Response that gets ignored or dropped via Middleware (eg. HTTP Status 400).

like image 111
Rejected Avatar answered Oct 13 '22 09:10

Rejected