I wrote a crawler with Scrapy. There is a function in the pipeline where I write my data to a database. I use the logging module to log runtime logs. I found that when my string have Chinese <code>logging.error()</code> will throw an exception. But the crawler keeps running! I know this is a minor error but if there is a critical exception I will miss it if crawler keeps running. My question is: Is there a setting that I can force <code>Scrapy</code> stop when there is an exception?

You can use CLOSESPIDER_ERRORCOUNT <blockquote> An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors. </blockquote> By default it is set to 0 <code>CLOSESPIDER_ERRORCOUNT = 0</code> you can change it to 1 if you want to exit when you have the first error. UPDATE Read the answers of this question, you can also use: <pre class="prettyprint"><code>crawler.engine.close_spider(self, 'log message') </code></pre> for more information read : Close spider extension

how to force scrapy exit when there is an exception

1 Answers

You can use CLOSESPIDER_ERRORCOUNT

An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.

By default it is set to 0 CLOSESPIDER_ERRORCOUNT = 0 you can change it to 1 if you want to exit when you have the first error.

UPDATE

Read the answers of this question, you can also use:

crawler.engine.close_spider(self, 'log message')

for more information read :

Close spider extension

118

answered Nov 21 '22 09:11

parik

Related questions
                            
                                How to write a DownloadHandler for scrapy that makes requests through socksipy?
                            
                                BeautifulSoup tag is type bs4.element.NavigableString and bs4.element.Tag
                            
                                scrapy passing custom_settings to spider from script using CrawlerProcess.crawl()
                            
                                Can't scrape YouTube video's closed captions
                            
                                JTidy or Jsoup for Java
                            
                                Scrape data from a table with scrapy
                            
                                HTML table does not show on source file
                            
                                web scraping java beginner [closed]
                            
                                How do I scrape only the <body> tag off of a website
                            
                                Login to a website through web-scraping tool in Python
                            
                                Get the parameters of a JavaScript function with Scrapy
                            
                                Scrapy parse javascript
                            
                                Use scrapy to get list of urls, and then scrape content inside those urls
                            
                                Convert HTML Table to Pandas Data Frame in Python
                            
                                Why does this regex take so long to find email addresses in certain files?
                            
                                a (presumably basic) web scraping of http://www.ssa.gov/cgi-bin/popularnames.cgi in urllib
                            
                                Change IP address in ruby
                            
                                Use BeautifulSoup to get a value after a specific tag
                            
                                Do scrapers need to be written for every site they target?
                            
                                Python 3.x - iloc throws error - "single positional indexer is out-of-bounds"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to force scrapy exit when there is an exception

Tags:

web-scraping

scrapy

scott huang

People also ask

1 Answers

parik

Recent Activity

Donate For Us