I'm using a loop to generate my requests inside start_request()
and I'd like to pass the index to parse()
so it can store it in the item. However when I use self.i
the output has the i
max value (last loop turn) for every items. I can use response.url.re('regex to extract the index')
but I wonder if there is a clean way to pass a variable from start_requests to parse.
You can use scrapy.Request
meta
attribute:
import scrapy class MySpider(scrapy.Spider): name = 'myspider' def start_requests(self): urls = [...] for index, url in enumerate(urls): yield scrapy.Request(url, meta={'index':index}) def parse(self, response): print(response.url) print(response.meta['index'])
You can pass cb_kwargs
argument to scrapy.Request()
https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.cb_kwargs
import scrapy class MySpider(scrapy.Spider): name = 'myspider' def start_requests(self): urls = [...] for index, url in enumerate(urls): yield scrapy.Request(url, callback=self.parse, cb_kwargs={'index':index}) def parse(self, response, index): pass
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With