I am using scrapy
to crawl multiple websites, and I want to analyze the crawling rate.
The stats dumped at the end contain a downloader/response_count
value and a response_received_count
value. The former is systematically greater than the latter.
Why is there a difference and what element of the crawler does increment the two values in the stats collector?
CoreStats
is the Extension
responsible for response_received_count
DownloaderStats
is the Middleware
responsible for downloader/response_count
.CoreStats
extension is connecting the signal of signals.response_received
to incrementing the value of response_received_count
, so it should count every response that you get (even bad statuses), whilst DownloaderStats
middleware processes the response on a specific order as we can see here its order is 850
, so previous Downloader Middlewares (ones set with a number lower than 850
could drop or even get errors processing the response, and the downloader/response_count
would never be increased.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With