Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the function of "set_crawler" and "from_crawler" in 'crawl.py' in Scrapy?

Tags:

python

scrapy

I can't understand those functions. If I inherit the Spider or CrawlSpider, should I override those functions. If not, then why?

@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
    spider = super(CrawlSpider, cls).from_crawler(crawler, *args, **kwargs)
    spider._follow_links = crawler.settings.getbool(
                                   'CRAWLSPIDER_FOLLOW_LINKS', True)
    return spider

def set_crawler(self, crawler):
    super(CrawlSpider, self).set_crawler(crawler)
    self._follow_links = crawler.settings.getbool(
                                 'CRAWLSPIDER_FOLLOW_LINKS', True)
like image 688
xina1i Avatar asked Apr 21 '15 03:04

xina1i


1 Answers

Usually you don't need to override these functions, but it depends on what you want to do.

The from_crawler method (with the @classmethod decorator) is a factory method that will be used by Scrapy to instantiate the object (spider, extension, middleware, etc) where you added it.

It's often used for getting a reference to the crawler object (that holds references to objects like settings, stats, etc) and then either pass it as arguments to the object being created or set attributes to it.

In the particular example you've pasted, it's being used to read the value from a CRAWLSPIDER_FOLLOW_LINKS setting and set it to a _follow_links attribute in the spider.

You can see another simple example of usage of the from_crawler method in this extension that uses the crawler object to get the value of a setting and passing it as parameter to the extension and to connect some signals to some methods.

The set_crawler method has been deprecated in the latest Scrapy versions and should be avoided.

Read more:

  • Core API and the Crawler object
  • Writing your own Scrapy extension
  • Scrapy Signals
like image 78
Elias Dorneles Avatar answered Sep 28 '22 10:09

Elias Dorneles