scrapy- how to stop Redirect (302)

Tags:

I'm trying to crawl a url using Scrapy. But it redirects me to page that doesn't exist.

Redirecting (302) to <GET http://www.shop.inonit.in/mobile/Products/Inonit-Home-Decor--Knick-Knacks-Cushions/Shor-Sharaba/Andaz-Apna-Apna-Cushion-Cover/1275197> from <GET http://www.shop.inonit.in/Products/Inonit-Home-Decor--Knick-Knacks-Cushions/Shor-Sharaba/Andaz-Apna-Apna-Cushion-Cover/pid-1275197.aspx>

The problem is http://www.shop.inonit.in/Products/Inonit-Home-Decor--Knick-Knacks-Cushions/Shor-Sharaba/Andaz-Apna-Apna-Cushion-Cover/pid-1275197.aspx exists, but http://www.shop.inonit.in/mobile/Products/Inonit-Home-Decor--Knick-Knacks-Cushions/Shor-Sharaba/Andaz-Apna-Apna-Cushion-Cover/1275197 doesn't, so the crawler cant find this. I've crawled many other websites as well but didn't have this problem anywhere else. Is there a way I can stop this redirect?

Any help would be much appreciated. Thanks.

Update: This is my spider class

class Inon_Spider(BaseSpider): name = 'Inon' allowed_domains = ['www.shop.inonit.in']  start_urls = ['http://www.shop.inonit.in/Products/Inonit-Gadget-Accessories-Mobile-Covers/-The-Red-Tag/Samsung-Note-2-Dead-Mau/pid-2656465.aspx']  def parse(self, response):      item = DealspiderItem()     hxs = HtmlXPathSelector(response)      title = hxs.select('//div[@class="aboutproduct"]/div[@class="container9"]/div[@class="ctl_aboutbrand"]/h1/text()').extract()     price = hxs.select('//span[@id="ctl00_ContentPlaceHolder1_Price_ctl00_spnWebPrice"]/span[@class="offer"]/span[@id="ctl00_ContentPlaceHolder1_Price_ctl00_lblOfferPrice"]/text()').extract()     prc = price[0].replace("Rs.  ","")     description = []      item['price'] = prc     item['title'] = title     item['description'] = description     item['url'] = response.url      return item

791

asked Mar 18 '13 12:03

user_2000

1 Answers

yes you can do this simply by adding meta values like

meta={'dont_redirect': True}

also you can stop redirected for a particular response code like

meta={'dont_redirect': True,"handle_httpstatus_list": [302]}

it will stop redirecting only 302 response codes. you can add as many http status code you want to avoid redirecting them.

example

yield Request('some url',     meta = {         'dont_redirect': True,         'handle_httpstatus_list': [302]     },     callback= self.some_call_back)

200

answered Sep 23 '22 14:09

akhter wahab

Related questions
                            
                                How to write Reads[T] and Writes[T] in scala Enumeration (play framework 2.1)
                            
                                Mockito UnfinishedStubbingException
                            
                                How to detect double precision floating point overflow and underflow?
                            
                                Coerce to number
                            
                                Do Local Notifications need user permission on iOS?
                            
                                Use of enable_shared_from_this with multiple inheritance
                            
                                uWSGI Fails with No module named encoding Error
                            
                                BCrypt.checkpw() Invalid salt version exception
                            
                                Git pull origin/master branch to local/master, when in local/develop
                            
                                examining items in a python Queue
                            
                                Should I be using an IAuthorizationFilter if I wish to create an ApiKey restricted resource with ASP.NET MVC4?
                            
                                Android Navigation Drawer and windowActionBarOverlay = true

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With