Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing custom parameter to scrapy request

Tags:

python

scrapy

I wanna set a custom parameter in my request so I can retrieve it when I process it in parse_item. This is my code:

def start_requests(self):
    yield Request("site_url", meta={'test_meta_key': 'test_meta_value'})

def parse_item(self, response):
    print response.meta

parse_item will be called according to the following rules:

self.rules = (
        Rule(SgmlLinkExtractor(deny=tuple(self.deny_keywords), allow=tuple(self.client_keywords)), callback='parse_item'),
        Rule(SgmlLinkExtractor(deny=tuple(self.deny_keywords), allow=('', ))),
    )

According to scrapy doc:

the Response.meta attribute is propagated along redirects and retries, so you will get the original Request.meta sent from your spider.

But I don't see the custom meta in parse_item. Is there anyway to fix this? Is meta the right way to go with?

like image 254
AliBZ Avatar asked Jan 14 '14 19:01

AliBZ


1 Answers

When you generate a new Request, you need to specify the callback function, otherwise it will be passed to the parse method of CrawlSpider as default.

I ran into a similar problem and it took me a while to debug.

callback (callable) – the function that will be called with the response of this request (once its downloaded) as its first parameter. For more information see Passing additional data to callback functions below. If a Request doesn’t specify a callback, the spider’s parse() method will be used. Note that if exceptions are raised during processing, errback is called instead.

method (string) – the HTTP method of this request. Defaults to 'GET'.

meta (dict) – the initial values for the Request.meta attribute. If given, the dict passed in this parameter will be shallow copied.

like image 87
B.Mr.W. Avatar answered Sep 27 '22 18:09

B.Mr.W.