I wrote a crawler with Scrapy.
There is a function in the pipeline where I write my data to a database. I use the logging module to log runtime logs.
I found that when my string have Chinese logging.error()
will throw an exception. But the crawler keeps running!
I know this is a minor error but if there is a critical exception I will miss it if crawler keeps running.
My question is: Is there a setting that I can force Scrapy
stop when there is an exception?
Exit Scrapy shell with the exit() command.
To force spider to close you can use raise CloseSpider exception as described here in scrapy docs. Just be sure to return/yield your items before you raise the exception.
start_urls contain those links from which the spider start crawling. If you want crawl recursively you should use crawlspider and define rules for that.
Spiders are classes which define how a certain site (or a group of sites) will be scraped, including how to perform the crawl (i.e. follow links) and how to extract structured data from their pages (i.e. scraping items).
You can use CLOSESPIDER_ERRORCOUNT
An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.
By default it is set to 0
CLOSESPIDER_ERRORCOUNT = 0
you can change it to 1 if you want to exit when you have the first error.
UPDATE
Read the answers of this question, you can also use:
crawler.engine.close_spider(self, 'log message')
for more information read :
Close spider extension
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With