Getting the final redirected URL

Question

My code is as follows:

url_orig ='http://www.has-sante.fr/portail/jcms/c_676945/fr/prialt-ct-5245'
u = urllib.request.urlopen(url_orig)
print (u.geturl())

Basically when the URL gets redirected twice. The output should be:

http://www.has-sante.fr/portail/upload/docs/application/pdf/2008-07/ct-5245_prialt_.pdf

But the output that I'm getting is the first redirect:

http://www.has-sante.fr/portail/plugins/ModuleXitiKLEE/types/FileDocument/doXiti.jsp?id=c_676945

How do I get the required final URL? Any help would be appreciated!

William Denman · Accepted Answer

This might be a bit overkill for what you want, but it is an alternative to using regular expressions. This answer uses the Selenium web automator Python APIs to follow the redirects. It will also open up the pdf file in a browser window. The code below requires that you are using Firefox, but you can also use other browsers by replacing the name with the one you want to use i.e. webdriver.Chrome(), webdriver.Ie().

To install selenium: pip install selenium

The code:

from selenium import webdriver

driver = webdriver.Firefox()
link = 'http://www.has-sante.fr/portail/jcms/c_676945/fr/prialt-ct-5245'

driver.get(link)
print(driver.current_url)

It is also possible to run the browser in the background so no window pops up. The added benefit to this solution is that if they change the way the re-direction works you will not need to update the regular expressions in your code.

Getting the final redirected URL

Tags:

python

urllib

python-3.4

url-redirection

user3691767

1 Answers

William Denman

Recent Activity

Donate For Us

Getting the final redirected URL

Tags:

python

urllib

python-3.4

url-redirection

user3691767

1 Answers

William Denman

Related questions

Recent Activity

Donate For Us