Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to pass a variable from start_requests() to parse() for each individual request?

Tags:

scrapy

I'm using a loop to generate my requests inside start_request() and I'd like to pass the index to parse() so it can store it in the item. However when I use self.i the output has the i max value (last loop turn) for every items. I can use response.url.re('regex to extract the index') but I wonder if there is a clean way to pass a variable from start_requests to parse.

like image 815
ChiseledAbs Avatar asked Jan 01 '17 09:01

ChiseledAbs


2 Answers

You can use scrapy.Request meta attribute:

import scrapy  class MySpider(scrapy.Spider):     name = 'myspider'      def start_requests(self):         urls = [...]         for index, url in enumerate(urls):             yield scrapy.Request(url, meta={'index':index})      def parse(self, response):         print(response.url)         print(response.meta['index']) 
like image 195
Granitosaurus Avatar answered Sep 24 '22 19:09

Granitosaurus


You can pass cb_kwargs argument to scrapy.Request()

https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.cb_kwargs

import scrapy  class MySpider(scrapy.Spider):     name = 'myspider'      def start_requests(self):         urls = [...]         for index, url in enumerate(urls):             yield scrapy.Request(url, callback=self.parse, cb_kwargs={'index':index})      def parse(self, response, index):         pass 
like image 26
jay padaliya Avatar answered Sep 24 '22 19:09

jay padaliya