I have the item
object and i need to pass that along many pages to store data in single item
LIke my item is
class DmozItem(Item): title = Field() description1 = Field() description2 = Field() description3 = Field()
Now those three description are in three separate pages. i want to do somrething like
Now this works good for parseDescription1
def page_parser(self, response): sites = hxs.select('//div[@class="row"]') items = [] request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription1) request.meta['item'] = item return request def parseDescription1(self,response): item = response.meta['item'] item['desc1'] = "test" return item
But i want something like
def page_parser(self, response): sites = hxs.select('//div[@class="row"]') items = [] request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription1) request.meta['item'] = item request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription2) request.meta['item'] = item request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription2) request.meta['item'] = item return request def parseDescription1(self,response): item = response.meta['item'] item['desc1'] = "test" return item def parseDescription2(self,response): item = response.meta['item'] item['desc2'] = "test2" return item def parseDescription3(self,response): item = response.meta['item'] item['desc3'] = "test3" return item
It is an old topic, but for anyone who needs it, to pass an extra parameter you must use cb_kwargs , then call the parameter in the parse method. You can refer to this part of the documentation.
Python scrapy. Request(). You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may also want to check out all available functions/classes of the module scrapy , or try the search function .
In the callback function, you parse the response (web page) and return either Item objects, Request objects, or an iterable of both. Those Requests will also contain a callback (maybe the same) and will then be downloaded by Scrapy and then their response handled by the specified callback.
No problem. Following is correct version of your code:
def page_parser(self, response): sites = hxs.select('//div[@class="row"]') items = [] request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription1) request.meta['item'] = item yield request request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription2, meta={'item': item}) yield request yield Request("http://www.example.com/lin1.cpp", callback=self.parseDescription3, meta={'item': item}) def parseDescription1(self,response): item = response.meta['item'] item['desc1'] = "test" return item def parseDescription2(self,response): item = response.meta['item'] item['desc2'] = "test2" return item def parseDescription3(self,response): item = response.meta['item'] item['desc3'] = "test3" return item
In order to guarantee an ordering of the requests/callbacks and that only one item is ultimately returned you need to chain your requests using a form like:
def page_parser(self, response): sites = hxs.select('//div[@class="row"]') items = [] request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription1) request.meta['item'] = Item() return [request] def parseDescription1(self,response): item = response.meta['item'] item['desc1'] = "test" return [Request("http://www.example.com/lin2.cpp", callback=self.parseDescription2, meta={'item': item})] def parseDescription2(self,response): item = response.meta['item'] item['desc2'] = "test2" return [Request("http://www.example.com/lin3.cpp", callback=self.parseDescription3, meta={'item': item})] def parseDescription3(self,response): item = response.meta['item'] item['desc3'] = "test3" return [item]
Each callback function returns an iterable of items or requests, requests are scheduled and items are run through your item pipeline.
If you return an item from each of the callbacks, you'll end up with 4 items in various states of completeness in your pipeline, but if you return the next request, then you can guaruntee the order of requests and that you will have exactly one item at the end of execution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With