Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

handle all exception in scrapy with sentry

I'm working on a project with scrapy for a while now, and i wanted to integrate sentry,

I've used scrapy-sentry but it it didn't work at all

i tried also to implement it using Extensions but it works only if an error occurred in the spider's callback (not pipelines.py, items.py)...

from scrapy import signals

from raven import Client


class FailLogger(object):
    client = Client(settings.get('SENTRY_DSN'))

    @classmethod
    def from_crawler(cls, crawler):
        ext = cls()

        crawler.signals.connect(ext.spider_error, signal=signals.spider_error)
        return ext

    def spider_error(self, failure, response, spider):
        try:
            failure.raiseException()
        except:
            self.client.get_ident(self.client.captureException())

is there any that i can log errors (in spiders, items, pipelines ...) to sentry, like in Django?

Thank you.

like image 262
elmkarami Avatar asked Aug 12 '14 11:08

elmkarami


1 Answers

It's an old post but my answer may be useful to others. Raven was replaced by sentry-python (named sentry-sdk in pip). Using this new package, there is a much simpler and complete solution than scrapy-sentry. It's based on the fact that scrapy logging features are based on the stdlib logging module.

You can use the following very simple scrapy extension to catch exceptions and errors inside and outside spiders (including downloader middlewares, item middlewares, etc.).

  1. Add to the extensions.py file of your scrapy project the SentryLogging extension:
import sentry_sdk
from scrapy.exceptions import NotConfigured

class SentryLogging(object):
    """
    Send exceptions and errors to Sentry.
    """

    @classmethod
    def from_crawler(cls, crawler):
        sentry_dsn = crawler.settings.get('SENTRY_DSN', None)
        if sentry_dsn is None:
            raise NotConfigured
        # instantiate the extension object
        ext = cls()
        # instantiate
        sentry_sdk.init(sentry_dsn)
        # return the extension object
        return ext
  1. Add the following lines to your settings.py to activate it with low a value to catch exceptions and errors as soon as possible:
# Enable or disable extensions
# See https://doc.scrapy.org/en/latest/topics/extensions.html
EXTENSIONS = {
    'myproject.extensions.SentryLogging': -1, # Load SentryLogging extension before others
}

# Send exceptions to Sentry
# replace SENTRY_DSN by you own DSN
SENTRY_DSN = "XXXXXXXXXX"

Make sure to replace SENTRY_DSN by the Sentry DSN of the associated project.

Errors and exceptions inside and outside spiders should now be sent to Sentry. If you want to further customize what is sent to Sentry, you may want to edit the call to sentry_sdk.init() according to its documentation.

like image 177
Framartin Avatar answered Sep 20 '22 00:09

Framartin