Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing arguments inside Scrapy spider through lambda callbacks

HI,

I'm have this short spider code:

class TestSpider(CrawlSpider):
    name = "test"
    allowed_domains = ["google.com", "yahoo.com"]
    start_urls = [
        "http://google.com"
    ]

    def parse2(self, response, i):
        print "page2, i: ", i
        # traceback.print_stack()


    def parse(self, response):
        for i in range(5):
            print "page1 i : ", i
            link = "http://www.google.com/search?q=" + str(i)
            yield Request(link, callback=lambda r:self.parse2(r, i))

and I would expect the output like this:

page1 i :  0
page1 i :  1
page1 i :  2
page1 i :  3
page1 i :  4

page2 i :  0
page2 i :  1
page2 i :  2
page2 i :  3
page2 i :  4

, however, the actual output is this:

page1 i :  0
page1 i :  1
page1 i :  2
page1 i :  3
page1 i :  4

page2 i :  4
page2 i :  4
page2 i :  4
page2 i :  4
page2 i :  4

so, the arguemnt I pass in callback=lambda r:self.parse2(r, i) is somehow wrong.

What's wrong with the code ?

like image 701
mamamia Avatar asked Oct 08 '10 05:10

mamamia


1 Answers

According to the Scrapy documentation using lambda will prevent the libraries Jobs functionality from working (http://doc.scrapy.org/en/latest/topics/jobs.html).

The Request() and FormRequest() both contain a dictionary named meta which can be used to pass arguments.

def some_callback(self, response):
    somearg = 'test'
    yield Request('http://www.example.com', 
                   meta={'somearg': somearg}, 
                   callback=self.other_callback)

def other_callback(self, response):
    somearg = response.meta['somearg']
    print "the argument passed is:", somearg
like image 131
TomDotTom Avatar answered Oct 22 '22 05:10

TomDotTom