Scrapy error: TypeError: __init__() got an unexpected keyword argument 'callback'

Question

I'm trying to scrape a website by extracting all links with "huis" (="house" in Dutch) in them. Following http://doc.scrapy.org/en/latest/topics/spiders.html, I'm trying

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

from Funda.items import FundaItem

class FundaSpider(scrapy.Spider):
    name = "Funda"
    allowed_domains = ["funda.nl"]
    start_urls = [
        "http://www.funda.nl/koop/amsterdam/"
    ]

    rules = (
    Rule(LinkExtractor(allow=r'.*huis.*', callback='parse_item'))
    )

    def parse_item(self, response):
        item = FundaItem()
        item['title'] = response.extract()
        return item

However, I'm getting the error message

Rule(LinkExtractor(allow=r'.*huis.*', callback='parse_item'))
TypeError: __init__() got an unexpected keyword argument 'callback'

From a previous post (Scrapy Error: TypeError: __init__() got an unexpected keyword argument 'deny') it looks like a possible reason is mismatched brackets, such that the keyword is passed to Rule instead of LinkExtractor. It seems to me that in this case, however, callback is within the LinkExtractor bracket as intended.

Any ideas what is causing this error?

Kevin · Accepted Answer

Yes, callback is definitely being passed to LinkExtractor. That seems to be the problem, actually, because I don't see callback under the expected parameters for that class in the documentation.

I see that the Rule class does have a callback parameter listed in the documentation. So maybe you're supposed to pass it to Rule instead of LinkExtractor?

Rule(LinkExtractor(allow=r'.*huis.*'), callback='parse_item')

If you're thinking "but why did the answerer of the linked question put callback inside the LinkExtractor call?", I think you may be misinterpreting the nesting of the parentheses, which is admittedly somewhat confusing. Changing the layout makes it a little clearer:

rules = (
    Rule(
        LinkExtractor(
            allow=[r'/*'], 
            deny=('blogs/*', 'videos/*', )
        ),
        callback='parse_html'
    ), 
)

Scrapy error: TypeError: init() got an unexpected keyword argument 'callback'

Tags:

python

scrapy

Kurt Peek

1 Answers

Kevin

Recent Activity

Donate For Us