Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How restart Scrapy spider

What I need:

  1. start crawler
  2. crawler job done
  3. wait 1 minute
  4. start crawler again

I try this:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from time import sleep

while True:
    process = CrawlerProcess(get_project_settings())
    process.crawl('spider_name')
    process.start()
    sleep(60)

But get error:

twisted.internet.error.ReactorNotRestartable

please help me do it right

Python 3.6
Scrapy 1.3.2
Linux

like image 524
sojowok Avatar asked Feb 19 '17 22:02

sojowok


People also ask

How do you close a scrapy spider?

To force spider to close you can use raise CloseSpider exception as described here in scrapy docs. Just be sure to return/yield your items before you raise the exception.

How do you stop a scrapy shell?

Try ctrl+c twice to terminate and ctrl+z+Enter to exit.

How do I run a scrapy in Python?

Basic ScriptThe key to running scrapy in a python script is the CrawlerProcess class. This is a class of the Crawler module. It provides the engine to run scrapy within a python script. Within the CrawlerProcess class code, python's twisted framework is imported.

Where is scrapy settings?

The project settings module is the standard configuration file for your Scrapy project, it's where most of your custom settings will be populated. For a standard Scrapy project, this means you'll be adding or changing the settings in the settings.py file created for your project.


1 Answers

I think I found the solution:

from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
from twisted.internet import task


timeout = 60


def run_spider():
    l.stop()
    runner = CrawlerRunner(get_project_settings())
    d = runner.crawl('spider_name')
    d.addBoth(lambda _: l.start(timeout, False))


l = task.LoopingCall(run_spider)
l.start(timeout)

reactor.run()
like image 65
sojowok Avatar answered Oct 22 '22 16:10

sojowok