Let Scrapy continue to crawl from last break point

Question

I'm using scrapy to crawl an website, but bad thing happens (power down, etc.).

I wonder how can I continue my crawling from where it was broke. I don't want to start over from the seeds.

Danilo Bargen · Accepted Answer

This can be done by persisting scheduled requests to the disk.

scrapy crawl somespider -s JOBDIR=crawls/somespider-1

See http://doc.scrapy.org/en/latest/topics/jobs.html for more information.

Donate For Us