How would one go about implementing a custom signal in scrapy? My project implements a scoring system. Depending on the score of an item it is either accepted or rejected. I would like to be able to signal ITEM_ACCEPTED and ITEM_REJECTED to collect stats about the crawl.
I was looking at the source, https://github.com/scrapy/scrapy/blob/master/scrapy/signals.py - but it was unclear to me what is going on here.
Any clarification on how to send this signal would also be helpful.
Any advice is appreciated!
Edit: I found this on scrapy docs:
http://doc.scrapy.org/en/latest/topics/api.html#module-scrapy.signalmanager
One of my spiders:
from Scrapers.extensions import signals #my custom signals
def parse(self, response):
manager = SignalManager(self)
manager.send_catch_log(signals.ITEM_ACCEPTED)
manager.send_catch_log(signals.ITEM_REJECTED)
my extension:
from Scrapers.extensions import signals as custom
@classmethod
def from_crawler(cls, crawler):
o = cls(crawler.stats)
crawler.signals.connect(o.spider_closed, signal=signals.spider_closed)
crawler.signals.connect(o.spider_error, signal=signals.spider_error)
crawler.signals.connect(o.item_scraped, signal=signals.item_scraped)
crawler.signals.connect(o.item_accepted, signal=custom.ITEM_ACCEPTED)
crawler.signals.connect(o.item_rejected, signal=custom.ITEM_REJECTED)
return o
def item_accepted(self):
print "it worked -- accepted"
def item_rejected(self):
print "it worked -- rejected"
signals
ITEM_ACCEPTED = object()
ITEM_REJECTED = object()
You were instancing a new signal manager instead of using Crawler's one so replacing this line:
manager = SignalManager(self)
You can use this to get the actual signal manager:
manager = self.crawler.signals
It worked for me
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With