What I need:
I try this:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from time import sleep
while True:
process = CrawlerProcess(get_project_settings())
process.crawl('spider_name')
process.start()
sleep(60)
But get error:
twisted.internet.error.ReactorNotRestartable
please help me do it right
Python 3.6
Scrapy 1.3.2
Linux
To force spider to close you can use raise CloseSpider exception as described here in scrapy docs. Just be sure to return/yield your items before you raise the exception.
Try ctrl+c twice to terminate and ctrl+z+Enter to exit.
Basic ScriptThe key to running scrapy in a python script is the CrawlerProcess class. This is a class of the Crawler module. It provides the engine to run scrapy within a python script. Within the CrawlerProcess class code, python's twisted framework is imported.
The project settings module is the standard configuration file for your Scrapy project, it's where most of your custom settings will be populated. For a standard Scrapy project, this means you'll be adding or changing the settings in the settings.py file created for your project.
I think I found the solution:
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
from twisted.internet import task
timeout = 60
def run_spider():
l.stop()
runner = CrawlerRunner(get_project_settings())
d = runner.crawl('spider_name')
d.addBoth(lambda _: l.start(timeout, False))
l = task.LoopingCall(run_spider)
l.start(timeout)
reactor.run()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With