How can i use multiple requests and pass items in between them in scrapy python

Tags:

2 Answers

No problem. Following is correct version of your code:

def page_parser(self, response):       sites = hxs.select('//div[@class="row"]')       items = []        request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription1)       request.meta['item'] = item       yield request        request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription2, meta={'item': item})       yield request        yield Request("http://www.example.com/lin1.cpp", callback=self.parseDescription3, meta={'item': item})  def parseDescription1(self,response):             item = response.meta['item']             item['desc1'] = "test"             return item  def parseDescription2(self,response):             item = response.meta['item']             item['desc2'] = "test2"             return item  def parseDescription3(self,response):             item = response.meta['item']             item['desc3'] = "test3"             return item

200

answered Sep 21 '22 20:09

warvariuc

In order to guarantee an ordering of the requests/callbacks and that only one item is ultimately returned you need to chain your requests using a form like:

  def page_parser(self, response):         sites = hxs.select('//div[@class="row"]')         items = []          request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription1)         request.meta['item'] = Item()         return [request]     def parseDescription1(self,response):         item = response.meta['item']         item['desc1'] = "test"         return [Request("http://www.example.com/lin2.cpp", callback=self.parseDescription2, meta={'item': item})]     def parseDescription2(self,response):         item = response.meta['item']         item['desc2'] = "test2"         return [Request("http://www.example.com/lin3.cpp", callback=self.parseDescription3, meta={'item': item})]    def parseDescription3(self,response):         item = response.meta['item']         item['desc3'] = "test3"         return [item]

Each callback function returns an iterable of items or requests, requests are scheduled and items are run through your item pipeline.

If you return an item from each of the callbacks, you'll end up with 4 items in various states of completeness in your pipeline, but if you return the next request, then you can guaruntee the order of requests and that you will have exactly one item at the end of execution.

answered Sep 20 '22 20:09

Dave McLain

Related questions
                            
                                Automatically remove *.pyc files and otherwise-empty directories when I check out a new branch
                            
                                Sorting by arbitrary lambda
                            
                                How do I use TensorFlow GPU?
                            
                                Python ImportError cannot import urandom Since Ubuntu 12.04 upgrade
                            
                                how to get derived class name from base class
                            
                                Size of data type using NumPy
                            
                                What does a colon and comma stand in a python list?
                            
                                lxml etree xmlparser remove unwanted namespace
                            
                                join or merge with overwrite in pandas
                            
                                Cast base class to derived class python (or more pythonic way of extending classes)
                            
                                How to add sequential counter column on groups using Pandas groupby
                            
                                Is there something similar to 'rake routes' in django? [duplicate]
                            
                                Pytest: Deselecting tests
                            
                                Python copy files to a new directory and rename if file name already exists
                            
                                In numpy.sum() there is parameter called "keepdims". What does it do?
                            
                                Sqlalchemy delete subquery
                            
                                numpy.sin function in degrees?
                            
                                How can I use pywin32 with a virtualenv without having to include the host environment's site-packages folder?
                            
                                Fast replacement of values in a numpy array
                            
                                How can I change the x axis in matplotlib so there is no white space?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can i use multiple requests and pass items in between them in scrapy python

Tags:

python

scrapy

user1858027

People also ask

2 Answers

warvariuc

Dave McLain

Recent Activity

Donate For Us