Scrapy: How to manually insert a request from a spider_idle event callback?

Tags:

scrapy

I've created a spider, and have linked a method to the spider_idle event.

How do I add a request manually? I can't just return the item from parse -- parse is not running in this case, as all known URLs have been parsed. I have a method to generate new requests, and I would like to run it from the spider_idle callback to add the created request(s).

class FooSpider(BaseSpider):
    name = 'foo'

    def __init__(self):
        dispatcher.connect(self.dont_close_me, signals.spider_idle)

    def dont_close_me(self, spider):
        if spider != self:
            return
        # The engine instance will allow me to schedule requests, but
        # how do I get the engine object?
        engine = unknown_get_engine()
        engine.schedule(self.create_request())

        # afterward, ensure we stay alive by raising DontCloseSpider
        raise DontCloseSpider("..I prefer live spiders.")

UPDATE: I've determined that I probably need the ExecutionEngine object, but I don't exactly know how to get that from a spider, though it available from a Crawler instance.

UPDATE 2: ..thanks. ..crawler is attached as a property of the superclass, so I can just use self.crawler with no additional effort. >.>

241

asked Jun 06 '13 19:06

Mr. B

1 Answers

class FooSpider(BaseSpider):
    def __init__(self, *args, **kwargs):
        super(FooSpider, self).__init__(*args, **kwargs)
        dispatcher.connect(self.dont_close_me, signals.spider_idle)

    def dont_close_me(self, spider):
        if spider != self:
            return

        self.crawler.engine.crawl(self.create_request(), spider)

        raise DontCloseSpider("..I prefer live spiders.")

Update 2016:

class FooSpider(BaseSpider):
    yet = False

    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        from_crawler = super(FooSpider, cls).from_crawler
        spider = from_crawler(crawler, *args, **kwargs)
        crawler.signals.connect(spider.idle, signal=scrapy.signals.spider_idle)
        return spider

    def idle(self):
        if not self.yet:
            self.crawler.engine.crawl(self.create_request(), self)
            self.yet = True

143

answered Nov 16 '22 00:11

Steven Almeroth

Related questions
                            
                                Python: sort this dictionary (dict in dict)
                            
                                Map list by partial function vs lambda
                            
                                Python: super and __init__() vs __init__( self )
                            
                                run web app with gevent
                            
                                Runtime Error with Vim Omnicompletion
                            
                                Pyparsing setParseAction function is getting no arguments
                            
                                Python - set list range to a specific value
                            
                                Where can I find python's built-in classes' methods and attributes? [closed]
                            
                                Unexpected behavior in PHP - Same code gives correct results in C# and Python
                            
                                Importing modules: __main__ vs import as module
                            
                                My FB HackerCup code too slow of large inputs
                            
                                Pandas - grouping intra day timeseries by date
                            
                                Cannot return results from stored procedure using Python cursor
                            
                                How can I call the send function without getting output?
                            
                                In python, how to store 'constants' for functions only once?
                            
                                How to apply custom column order (on Categorical) to pandas boxplot?
                            
                                decorator to set attributes of function
                            
                                Python: Split a string, respect and preserve quotes [duplicate]
                            
                                How to add stdout and stderr to logger file in flask
                            
                                Python gzip refuses to read uncompressed file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With