HI,
I'm have this short spider code:
class TestSpider(CrawlSpider):
name = "test"
allowed_domains = ["google.com", "yahoo.com"]
start_urls = [
"http://google.com"
]
def parse2(self, response, i):
print "page2, i: ", i
# traceback.print_stack()
def parse(self, response):
for i in range(5):
print "page1 i : ", i
link = "http://www.google.com/search?q=" + str(i)
yield Request(link, callback=lambda r:self.parse2(r, i))
and I would expect the output like this:
page1 i : 0
page1 i : 1
page1 i : 2
page1 i : 3
page1 i : 4
page2 i : 0
page2 i : 1
page2 i : 2
page2 i : 3
page2 i : 4
, however, the actual output is this:
page1 i : 0
page1 i : 1
page1 i : 2
page1 i : 3
page1 i : 4
page2 i : 4
page2 i : 4
page2 i : 4
page2 i : 4
page2 i : 4
so, the arguemnt I pass in callback=lambda r:self.parse2(r, i)
is somehow wrong.
What's wrong with the code ?
According to the Scrapy documentation using lambda will prevent the libraries Jobs functionality from working (http://doc.scrapy.org/en/latest/topics/jobs.html).
The Request() and FormRequest() both contain a dictionary named meta which can be used to pass arguments.
def some_callback(self, response):
somearg = 'test'
yield Request('http://www.example.com',
meta={'somearg': somearg},
callback=self.other_callback)
def other_callback(self, response):
somearg = response.meta['somearg']
print "the argument passed is:", somearg
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With