I can't understand those functions. If I inherit the Spider
or CrawlSpider
, should I override those functions. If not, then why?
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(CrawlSpider, cls).from_crawler(crawler, *args, **kwargs)
spider._follow_links = crawler.settings.getbool(
'CRAWLSPIDER_FOLLOW_LINKS', True)
return spider
def set_crawler(self, crawler):
super(CrawlSpider, self).set_crawler(crawler)
self._follow_links = crawler.settings.getbool(
'CRAWLSPIDER_FOLLOW_LINKS', True)
Usually you don't need to override these functions, but it depends on what you want to do.
The from_crawler
method (with the @classmethod
decorator) is a factory method that will be used by Scrapy to instantiate the object (spider, extension, middleware, etc) where you added it.
It's often used for getting a reference to the crawler
object (that holds references to objects like settings
, stats
, etc) and then either pass it as arguments to the object being created or set attributes to it.
In the particular example you've pasted, it's being used to read the value from a CRAWLSPIDER_FOLLOW_LINKS
setting and set it to a _follow_links
attribute in the spider.
You can see another simple example of usage of the from_crawler method in this extension that uses the crawler
object to get the value of a setting and passing it as parameter to the extension and to connect some signals to some methods.
The set_crawler
method has been deprecated in the latest Scrapy versions and should be avoided.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With