I am using scrapy to crawl multiple websites, and I want to analyze the crawling rate.
The stats dumped at the end contain a downloader/response_count value and a response_received_count value. The former is systematically greater than the latter.
Why is there a difference and what element of the crawler does increment the two values in the stats collector?
CoreStats is the Extension responsible for response_received_count
DownloaderStats is the Middleware responsible for downloader/response_count.CoreStats extension is connecting the signal of signals.response_received to incrementing the value of response_received_count, so it should count every response that you get (even bad statuses), whilst DownloaderStats middleware processes the response on a specific order as we can see here its order is 850, so previous Downloader Middlewares (ones set with a number lower than 850 could drop or even get errors processing the response, and the downloader/response_count would never be increased.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With