Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy: Can't override __init__function

Tags:

python

scrapy

I have created a spider which inherits from CrawlSpider.

I need to use the __init__ function but always getting this error:

code:

class mySpider(CrawlSpider):

 def __init__(self):
   super(mySpider, self).__init__()
     .....

this is the error I'm getting: KeyError Spider not found: mySpider.

without the __init__ function everything works

like image 312
DjangoPy Avatar asked Jul 21 '12 17:07

DjangoPy


1 Answers

You need to put it like this:

def __init__(self, *a, **kw):
    super(MySpider, self).__init__(*a, **kw)
    # your code here

Working example:

class MySpider(CrawlSpider):
    name = "company"
    allowed_domains = ["site.com"]
    start_urls = ["http://www.site.com"]

    def __init__(self, *a, **kw):
        super(MySpider, self).__init__(*a, **kw)
        dispatcher.connect(self.spider_closed, signals.spider_closed)

Here init was used to register scrapy signals in spider, I needed it in this example in spider instead of usually in pipeline

like image 88
iblazevic Avatar answered Oct 29 '22 11:10

iblazevic