How can I stop a scrapy CrawlSpider and later resume where it left-off?

Question

I have a Scrapy CrawlSpider that has a very large list of URLs to crawl. I would like to be able to stop it, saving the current status and resume it later without having to start over. Is there a way to accomplish this within the Scrapy framework?

niko_gramophon · Accepted Answer

Just wanted to share that feature is included in latest scrapy version, but parameter name is changed. You should use it like this:

 scrapy crawl thespider --set JOBDIR=run1

For more information here http://doc.scrapy.org/en/latest/topics/jobs.html#job-directory

How can I stop a scrapy CrawlSpider and later resume where it left-off?

Tags:

python

scrapy

Dave Forgac

1 Answers

niko_gramophon

Recent Activity

Donate For Us

How can I stop a scrapy CrawlSpider and later resume where it left-off?

Tags:

python

scrapy

Dave Forgac

1 Answers

niko_gramophon

Related questions

Recent Activity

Donate For Us