Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom Signals Scrapy

How would one go about implementing a custom signal in scrapy? My project implements a scoring system. Depending on the score of an item it is either accepted or rejected. I would like to be able to signal ITEM_ACCEPTED and ITEM_REJECTED to collect stats about the crawl.

I was looking at the source, https://github.com/scrapy/scrapy/blob/master/scrapy/signals.py - but it was unclear to me what is going on here.

Any clarification on how to send this signal would also be helpful.

Any advice is appreciated!

Edit: I found this on scrapy docs:

http://doc.scrapy.org/en/latest/topics/api.html#module-scrapy.signalmanager

One of my spiders:

from Scrapers.extensions import signals #my custom signals

def parse(self, response):
    manager = SignalManager(self)
    manager.send_catch_log(signals.ITEM_ACCEPTED)
    manager.send_catch_log(signals.ITEM_REJECTED)

my extension:

from Scrapers.extensions import signals as custom

@classmethod
def from_crawler(cls, crawler):
    o = cls(crawler.stats)
    crawler.signals.connect(o.spider_closed, signal=signals.spider_closed)
    crawler.signals.connect(o.spider_error, signal=signals.spider_error)
    crawler.signals.connect(o.item_scraped, signal=signals.item_scraped)
    crawler.signals.connect(o.item_accepted, signal=custom.ITEM_ACCEPTED)
    crawler.signals.connect(o.item_rejected, signal=custom.ITEM_REJECTED)
    return o

def item_accepted(self):
    print "it worked -- accepted"

def item_rejected(self):
    print "it worked -- rejected"

signals

ITEM_ACCEPTED = object()
ITEM_REJECTED = object()
like image 291
rocktheartsm4l Avatar asked Sep 30 '22 03:09

rocktheartsm4l


1 Answers

You were instancing a new signal manager instead of using Crawler's one so replacing this line:

manager = SignalManager(self)

You can use this to get the actual signal manager:

manager = self.crawler.signals

It worked for me

like image 73
gerosalesc Avatar answered Oct 22 '22 19:10

gerosalesc