I'm trying to scrape a website by extracting all links with "huis" (="house" in Dutch) in them. Following http://doc.scrapy.org/en/latest/topics/spiders.html, I'm trying
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from Funda.items import FundaItem
class FundaSpider(scrapy.Spider):
name = "Funda"
allowed_domains = ["funda.nl"]
start_urls = [
"http://www.funda.nl/koop/amsterdam/"
]
rules = (
Rule(LinkExtractor(allow=r'.*huis.*', callback='parse_item'))
)
def parse_item(self, response):
item = FundaItem()
item['title'] = response.extract()
return item
However, I'm getting the error message
Rule(LinkExtractor(allow=r'.*huis.*', callback='parse_item'))
TypeError: __init__() got an unexpected keyword argument 'callback'
From a previous post (Scrapy Error: TypeError: __init__() got an unexpected keyword argument 'deny') it looks like a possible reason is mismatched brackets, such that the keyword is passed to Rule instead of LinkExtractor. It seems to me that in this case, however, callback is within the LinkExtractor bracket as intended.
Any ideas what is causing this error?
Yes, callback is definitely being passed to LinkExtractor. That seems to be the problem, actually, because I don't see callback under the expected parameters for that class in the documentation.
I see that the Rule class does have a callback parameter listed in the documentation. So maybe you're supposed to pass it to Rule instead of LinkExtractor?
Rule(LinkExtractor(allow=r'.*huis.*'), callback='parse_item')
If you're thinking "but why did the answerer of the linked question put callback inside the LinkExtractor call?", I think you may be misinterpreting the nesting of the parentheses, which is admittedly somewhat confusing. Changing the layout makes it a little clearer:
rules = (
Rule(
LinkExtractor(
allow=[r'/*'],
deny=('blogs/*', 'videos/*', )
),
callback='parse_html'
),
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With