Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ReactorNotRestartable error in while loop with scrapy

I get twisted.internet.error.ReactorNotRestartable error when I execute following code:

from time import sleep from scrapy import signals from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from scrapy.xlib.pydispatch import dispatcher  result = None  def set_result(item):     result = item  while True:     process = CrawlerProcess(get_project_settings())     dispatcher.connect(set_result, signals.item_scraped)      process.crawl('my_spider')     process.start()      if result:         break     sleep(3) 

For the first time it works, then I get error. I create process variable each time, so what's the problem?

like image 488
k_wit Avatar asked Oct 09 '16 17:10

k_wit


1 Answers

By default, CrawlerProcess's .start() will stop the Twisted reactor it creates when all crawlers have finished.

You should call process.start(stop_after_crawl=False) if you create process in each iteration.

Another option is to handle the Twisted reactor yourself and use CrawlerRunner. The docs have an example on doing that.

like image 109
paul trmbrth Avatar answered Oct 08 '22 03:10

paul trmbrth