Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapyd init error when running scrapy spider

I'm trying to deploy a crawler with four spiders. One of the spiders uses XMLFeedSpider and runs fine from the shell and scrapyd, but the others use BaseSpider and all give this error when run in scrapyd, but run fine from the shell

TypeError: init() got an unexpected keyword argument '_job'

From what I've read this points to a problem with the init function in my spiders, but I cannot seem to solve the problem. I don't need an init function and if I remove it completely I still get the error!

My Spider looks like this

from scrapy import log
from scrapy.spider import BaseSpider
from scrapy.selector import XmlXPathSelector
from betfeeds_master.items import Odds
# Parameters
MYGLOBAL = 39
class homeSpider(BaseSpider): 
    name = "home" 
    #con = None

    allowed_domains = ["www.myhome.com"]
    start_urls = [
        "http://www.myhome.com/oddxml.aspx?lang=en&subscriber=mysubscriber",
    ]
    def parse(self, response):

        items = []

        traceCompetition = ""

        xxs = XmlXPathSelector(response)
        oddsobjects = xxs.select("//OO[OddsType='3W' and Sport='Football']")
        for oddsobject in oddsobjects:
            item = Odds()
            item['competition'] = ''.join(oddsobject.select('Tournament/text()').extract())
            if traceCompetition != item['competition']:
                log.msg('Processing %s' % (item['competition']))                #print item['competition']
                traceCompetition = item['competition']
            item['matchDate'] = ''.join(oddsobject.select('Date/text()').extract())
            item['homeTeam'] = ''.join(oddsobject.select('OddsData/HomeTeam/text()').extract())
            item['awayTeam'] = ''.join(oddsobject.select('OddsData/AwayTeam/text()').extract())
            item['lastUpdated'] = ''
            item['bookie'] = MYGLOBAL
            item['home'] = ''.join(oddsobject.select('OddsData/HomeOdds/text()').extract())
            item['draw'] = ''.join(oddsobject.select('OddsData/DrawOdds/text()').extract())
            item['away'] = ''.join(oddsobject.select('OddsData/AwayOdds/text()').extract())

            items.append(item)

        return items

I can put an use an init function in to the spider, but I get exactly the same error.

def __init__(self, *args, **kwargs):
    super(homeSpider, self).__init__(*args, **kwargs)
    pass

Why is this happening and how do I solve it?

like image 867
Cruachan Avatar asked Oct 03 '22 09:10

Cruachan


1 Answers

The good answer was given by alecx :

My init function was :

def __init__(self, domain_name):

In order to work within an egg for scrapyd, it should be :

def __init__(self, domain_name, **kwargs):

considering you pass domain_name as mandatory argument

like image 117
hugsbrugs Avatar answered Oct 12 '22 11:10

hugsbrugs