Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to force scrapy exit when there is an exception

I wrote a crawler with Scrapy.

There is a function in the pipeline where I write my data to a database. I use the logging module to log runtime logs.

I found that when my string have Chinese logging.error() will throw an exception. But the crawler keeps running!

I know this is a minor error but if there is a critical exception I will miss it if crawler keeps running.

My question is: Is there a setting that I can force Scrapy stop when there is an exception?

like image 519
scott huang Avatar asked Jun 08 '17 09:06

scott huang


People also ask

How do I quit Scrapy?

Exit Scrapy shell with the exit() command.

How do you close a Scrapy spider?

To force spider to close you can use raise CloseSpider exception as described here in scrapy docs. Just be sure to return/yield your items before you raise the exception.

What is Start_urls in Scrapy?

start_urls contain those links from which the spider start crawling. If you want crawl recursively you should use crawlspider and define rules for that.

What is a spider in Scrapy?

Spiders are classes which define how a certain site (or a group of sites) will be scraped, including how to perform the crawl (i.e. follow links) and how to extract structured data from their pages (i.e. scraping items).


1 Answers

You can use CLOSESPIDER_ERRORCOUNT

An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.

By default it is set to 0 CLOSESPIDER_ERRORCOUNT = 0 you can change it to 1 if you want to exit when you have the first error.

UPDATE

Read the answers of this question, you can also use:

crawler.engine.close_spider(self, 'log message')

for more information read :

Close spider extension

like image 118
parik Avatar answered Nov 21 '22 09:11

parik