Can someone explain the difference between runspider and crawl commands? What are the contexts in which they should be used?
You can start by running the Scrapy tool with no arguments and it will print some usage help and the available commands: Scrapy X.Y - no active project Usage: scrapy <command> [options] [args] Available commands: crawl Run a spider fetch Fetch a URL using the Scrapy downloader [...]
Spiders are classes that you define and that Scrapy uses to scrape information from a website (or a group of websites). They must subclass Spider and define the initial requests to make, optionally how to follow links in the pages, and how to parse the downloaded page content to extract data.
In the command:
scrapy crawl [options] <spider>
<spider>
is the project name (defined in settings.py, as BOT_NAME
).
And in the command:
scrapy runspider [options] <spider_file>
<spider_file>
is the path to the file that contains the spider.
Otherwise, the options are the same:
Options
=======
--help, -h show this help message and exit
-a NAME=VALUE set spider argument (may be repeated)
--output=FILE, -o FILE dump scraped items into FILE (use - for stdout)
--output-format=FORMAT, -t FORMAT
format to use for dumping items with -o
Global Options
--------------
--logfile=FILE log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
log level (default: DEBUG)
--nolog disable logging completely
--profile=FILE write python cProfile stats to FILE
--lsprof=FILE write lsprof profiling stats to FILE
--pidfile=FILE write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
set/override setting (may be repeated)
--pdb enable pdb on failure
Since runspider
doesn't depend on the BOT_NAME
parameter, depending on the way you are customising your scrapers, you might find runspider
more flexible.
The main difference is that runspider
does not need a project. That is, you can write a spider in a myspider.py
file and call scrapy runspider myspider.py
.
The crawl
command requires a project in order to find the project's settings, load available spiders from SPIDER_MODULES
settings, and lookup the spider by name
.
If you need quick spider for a short task, then runspider
has less boilerplate required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With