Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between scraper, crawler and spider in the context of Scrapy

Trying to read the code of Scrapy. The words scaper, crawler and spider are confusing. For example

scrapy.core.scraper
scrapy.crawler
scrapy.spiders

Could anyone explain the meanings and differences of these terms in the context of Scrapy? Thanks in advance.

like image 443
Frozen Flame Avatar asked Dec 14 '15 06:12

Frozen Flame


1 Answers

Crawler (scrapy.crawler) is the main entry point to Scrapy API. It provides access to all Scrapy core components, and it's used to hook extensions functionality into Scrapy.

Scraper (scrapy.core.scraper) component is responsible for parsing responses and extracting information from them. It's being run from the Engine, and it's used to run your spiders.

scrapy.spiders is a module containing base Spider implementation (that you use to write your spiders), together with some common spiders available out of the box (like the CrawlSpider for ruleset-based crawling, the SitemapSpider for sitemap based crawling, or XMLFeedSpider for crawling the XML feeds).

More information available on the official documentation pages:
http://doc.scrapy.org/en/latest/topics/spiders.html?highlight=crawlspider#spiders http://doc.scrapy.org/en/latest/topics/api.html?highlight=scrapy.crawler#module-scrapy.crawler

like image 186
bosnjak Avatar answered Sep 28 '22 00:09

bosnjak