I'm using scrapy to crawl an website, but bad thing happens (power down, etc.).
I wonder how can I continue my crawling from where it was broke. I don't want to start over from the seeds.
This can be done by persisting scheduled requests to the disk.
scrapy crawl somespider -s JOBDIR=crawls/somespider-1
See http://doc.scrapy.org/en/latest/topics/jobs.html for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With