Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I stop a scrapy CrawlSpider and later resume where it left-off?

Tags:

python

scrapy

I have a Scrapy CrawlSpider that has a very large list of URLs to crawl. I would like to be able to stop it, saving the current status and resume it later without having to start over. Is there a way to accomplish this within the Scrapy framework?

like image 216
Dave Forgac Avatar asked Sep 05 '11 19:09

Dave Forgac


1 Answers

Just wanted to share that feature is included in latest scrapy version, but parameter name is changed. You should use it like this:

 scrapy crawl thespider --set JOBDIR=run1

For more information here http://doc.scrapy.org/en/latest/topics/jobs.html#job-directory

like image 80
niko_gramophon Avatar answered Oct 01 '22 19:10

niko_gramophon