Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scrapy: Call a function when a spider quits

Tags:

python

scrapy

Is there a way to trigger a method in a Spider class just before it terminates?

I can terminate the spider myself, like this:

class MySpider(CrawlSpider):     #Config stuff goes here...      def quit(self):         #Do some stuff...         raise CloseSpider('MySpider is quitting now.')      def my_parser(self, response):         if termination_condition:             self.quit()          #Parsing stuff goes here... 

But I can't find any information on how to determine when the spider is about to quit naturally.

like image 806
Abe Avatar asked Sep 12 '12 18:09

Abe


2 Answers

It looks like you can register a signal listener through dispatcher.

I would try something like:

from scrapy import signals from scrapy.xlib.pydispatch import dispatcher  class MySpider(CrawlSpider):     def __init__(self):         dispatcher.connect(self.spider_closed, signals.spider_closed)      def spider_closed(self, spider):       # second param is instance of spder about to be closed. 

In the newer version of scrapy scrapy.xlib.pydispatch is deprecated. instead you can use from pydispatch import dispatcher.

like image 129
dm03514 Avatar answered Oct 13 '22 11:10

dm03514


Just to update, you can just call closed function like this:

class MySpider(CrawlSpider):     def closed(self, reason):         do-something() 
like image 20
THIS USER NEEDS HELP Avatar answered Oct 13 '22 13:10

THIS USER NEEDS HELP