I get twisted.internet.error.ReactorNotRestartable error when I execute following code:
from time import sleep from scrapy import signals from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from scrapy.xlib.pydispatch import dispatcher result = None def set_result(item): result = item while True: process = CrawlerProcess(get_project_settings()) dispatcher.connect(set_result, signals.item_scraped) process.crawl('my_spider') process.start() if result: break sleep(3) For the first time it works, then I get error. I create process variable each time, so what's the problem?
By default, CrawlerProcess's .start() will stop the Twisted reactor it creates when all crawlers have finished.
You should call process.start(stop_after_crawl=False) if you create process in each iteration.
Another option is to handle the Twisted reactor yourself and use CrawlerRunner. The docs have an example on doing that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With