Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get old url when there is redirection in scrapy?

scrapy version : 0.20

Problem:

start_urls=[URL1,URL2,URL3]

def parse(self,response):
    //suppose URL2 is redirected to other URL
    //I need to get current start URL(before redirection) 

I have tried with response.request.url but it is same as response.url

please help me out

like image 731
sushma Avatar asked Dec 16 '22 00:12

sushma


1 Answers

If you've got RedirectMiddleware enabled (it should be enabled by default), you can try:

original_url = response.meta.get('redirect_urls', [response.url])[0]

See https://github.com/scrapy/scrapy/blob/master/scrapy/downloadermiddlewares/redirect.py#L35 for implementation details

like image 194
paul trmbrth Avatar answered Mar 11 '23 13:03

paul trmbrth