Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy suppress handled errors

Relevant Code

def start_requests( self ):
    requests = [ Request( url['url'], meta=url['meta'], callback=self.parse, errback=self.handle_error ) for url in self.start_urls if valid_url( url['url'] )]
    return requests

def handle_error( self, err ):
    # Errors being saved in DB
    # So I don't want them displayed in the logs

I've got my own code for saving error codes in DB. I don't want them displayed in the log output. How can I suppress these errors?

Note that I don't want to suppress all errors - just the ones being handled here.

like image 882
HyderA Avatar asked Apr 17 '16 21:04

HyderA


2 Answers

Try to use self.skipped.add, self.failed.add with isinstance condition in your handle_error method.

Here is an example

def on_error(self, failure):
    if isinstance(failure.value, HttpError):
        response = failure.value.response
        if response.status in self.bypass_status_codes:
            self.skipped.add(response.url[-3:])
            return self.parse(response)

    # it assumes there is a response attached to failure
    self.failed.add(failure.value.response.url[-3:])
    return failure
like image 200
Daniil Mashkin Avatar answered Nov 04 '22 17:11

Daniil Mashkin


Answer by @Daniil Mashkin seems to be the most comprehensive solution.

For simple cases, you can add http error codes Spider.handle_httpstatus_list or HTTPERROR_ALLOWED_CODES in Settings.py.

This will send some erroneous answers to your callback function, thus skipping logging as well

like image 40
Frederic Bazin Avatar answered Nov 04 '22 15:11

Frederic Bazin