I want to yield an item only when the crawling is finished. I am trying to do it via
def spider_closed(self, spider):
item = EtsyItem()
item['total_sales'] = 1111111
yield item
But it does not yield anything, though the function is called. How do I yield an item after the scraping is over?
Depending on what you want to do, there might be a veeeery hacky solution for this.
Instead of spider_closed
you may want to consider using spider_idle
signal which is fired before spider_closed
. One difference between idle and close is that spider_idle
allows execution of requests which then may contain a callback or errback to yield the desired item.
Inside spider class:
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
# ...
crawler.signals.connect(spider.spider_idle, signal=signals.spider_idle)
return spider
# ...
def yield_item(self, response):
yield MyItem(name='myname')
def spider_idle(self, spider):
req = Request('https://fakewebsite123.xyz',
callback=lambda:None, errback=self.yield_item)
self.crawler.engine.crawl(req, spider)
However this comes with several side effects so i discourage anyone from using this in production, for example the final request which will raise a DNSLookupError
. I just want to show what is possible.
Oof, I'm afraid spider_closed
is used for tearing down. I suppose you can do it by attaching some custom stuff to Pipeline
to post-process your items.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With