Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the final redirected URL

My code is as follows:

url_orig ='http://www.has-sante.fr/portail/jcms/c_676945/fr/prialt-ct-5245'
u = urllib.request.urlopen(url_orig)
print (u.geturl())

Basically when the URL gets redirected twice. The output should be:

http://www.has-sante.fr/portail/upload/docs/application/pdf/2008-07/ct-5245_prialt_.pdf

But the output that I'm getting is the first redirect:

http://www.has-sante.fr/portail/plugins/ModuleXitiKLEE/types/FileDocument/doXiti.jsp?id=c_676945

How do I get the required final URL? Any help would be appreciated!

like image 777
user3691767 Avatar asked Jun 21 '14 07:06

user3691767


1 Answers

This might be a bit overkill for what you want, but it is an alternative to using regular expressions. This answer uses the Selenium web automator Python APIs to follow the redirects. It will also open up the pdf file in a browser window. The code below requires that you are using Firefox, but you can also use other browsers by replacing the name with the one you want to use i.e. webdriver.Chrome(), webdriver.Ie().

To install selenium: pip install selenium

The code:

from selenium import webdriver

driver = webdriver.Firefox()
link = 'http://www.has-sante.fr/portail/jcms/c_676945/fr/prialt-ct-5245'

driver.get(link)
print(driver.current_url)

It is also possible to run the browser in the background so no window pops up. The added benefit to this solution is that if they change the way the re-direction works you will not need to update the regular expressions in your code.

like image 147
William Denman Avatar answered Oct 01 '22 21:10

William Denman