Trying to read the code of Scrapy. The words scaper, crawler and spider are confusing. For example
scrapy.core.scraper
scrapy.crawler
scrapy.spiders
Could anyone explain the meanings and differences of these terms in the context of Scrapy? Thanks in advance.
Crawler (scrapy.crawler
) is the main entry point to Scrapy API. It provides access to all Scrapy core components, and it's used to hook extensions functionality into Scrapy.
Scraper (scrapy.core.scraper
) component is responsible for parsing responses and extracting information from them. It's being run from the Engine, and it's used to run your spiders.
scrapy.spiders
is a module containing base Spider
implementation (that you use to write your spiders), together with some common spiders available out of the box (like the CrawlSpider for ruleset-based crawling, the SitemapSpider for sitemap based crawling, or XMLFeedSpider for crawling the XML feeds).
More information available on the official documentation pages:
http://doc.scrapy.org/en/latest/topics/spiders.html?highlight=crawlspider#spiders
http://doc.scrapy.org/en/latest/topics/api.html?highlight=scrapy.crawler#module-scrapy.crawler
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With