I get twisted.internet.error.ReactorNotRestartable
error when I execute following code:
from time import sleep from scrapy import signals from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from scrapy.xlib.pydispatch import dispatcher result = None def set_result(item): result = item while True: process = CrawlerProcess(get_project_settings()) dispatcher.connect(set_result, signals.item_scraped) process.crawl('my_spider') process.start() if result: break sleep(3)
For the first time it works, then I get error. I create process
variable each time, so what's the problem?
By default, CrawlerProcess
's .start()
will stop the Twisted reactor it creates when all crawlers have finished.
You should call process.start(stop_after_crawl=False)
if you create process
in each iteration.
Another option is to handle the Twisted reactor yourself and use CrawlerRunner
. The docs have an example on doing that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With