Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy: downloader/response_count vs response_received_count

I am using scrapy to crawl multiple websites, and I want to analyze the crawling rate. The stats dumped at the end contain a downloader/response_count value and a response_received_count value. The former is systematically greater than the latter.

Why is there a difference and what element of the crawler does increment the two values in the stats collector?

like image 346
Thibault Randria Avatar asked Jan 02 '18 17:01

Thibault Randria


1 Answers

  • CoreStats is the Extension responsible for response_received_count
  • DownloaderStats is the Middleware responsible for downloader/response_count.

CoreStats extension is connecting the signal of signals.response_received to incrementing the value of response_received_count, so it should count every response that you get (even bad statuses), whilst DownloaderStats middleware processes the response on a specific order as we can see here its order is 850, so previous Downloader Middlewares (ones set with a number lower than 850 could drop or even get errors processing the response, and the downloader/response_count would never be increased.

like image 51
eLRuLL Avatar answered Sep 18 '22 04:09

eLRuLL