Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the number of requests in queue in scrapy?

Tags:

python

scrapy

I am using scrapy to crawl some websites. How to get the number of requests in the queue?

I have looked at the scrapy source code and find scrapy.core.scheduler.Scheduler may lead to my answer. See: https://github.com/scrapy/scrapy/blob/0.24/scrapy/core/scheduler.py

Two questions:

  1. How to access the scheduler in my spider class?
  2. What does the self.dqs and self.mqs mean in the scheduler class?
like image 653
Shuai Zhang Avatar asked Jan 27 '15 11:01

Shuai Zhang


1 Answers

This took me a while to figure out, but here's what I used:

self.crawler.engine.slot.scheduler

That is the instance of the scheduler. You can then call the __len__() method of it, or if you just need true/false for pending requests, do something like this:

self.crawler.engine.scheduler_cls.has_pending_requests(self.crawler.engine.slot.scheduler)

Beware that there could still be running requests even thought the queue is empty. To check how many requests are currently running use:

len(self.crawler.engine.slot.inprogress)
like image 93
Brad Avatar answered Nov 09 '22 21:11

Brad