I've been looking through the Python Requests documentation but I cannot see any functionality for what I am trying to achieve.
In my script I am setting allow_redirects=True
.
I would like to know if the page has been redirected to something else, what is the new URL.
For example, if the start URL was: www.google.com/redirect
And the final URL is www.google.co.uk/redirected
How do I get that URL?
Use Python urllib Library To Get Redirection URL. request module. Define a web page URL, suppose this URL will be redirected when you send a request to it. Get the response object. Get the webserver returned response status code, if the code is 301 then it means the URL has been redirected permanently.
Normally, fetch transparently follows HTTP-redirects, like 301, 302 etc.
Redirects allow you to forward the visitors of a specific URL to another page of your website. In Site Tools, you can add redirects by going to Domain > Redirects. Choose the desired domain, fill in the URL you want to redirect to another and add the URL of the new page destination. When ready, click Create.
You are looking for the request history.
The response.history
attribute is a list of responses that led to the final URL, which can be found in response.url
.
response = requests.get(someurl)
if response.history:
print("Request was redirected")
for resp in response.history:
print(resp.status_code, resp.url)
print("Final destination:")
print(response.status_code, response.url)
else:
print("Request was not redirected")
Demo:
>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/3')
>>> response.history
(<Response [302]>, <Response [302]>, <Response [302]>)
>>> for resp in response.history:
... print(resp.status_code, resp.url)
...
302 http://httpbin.org/redirect/3
302 http://httpbin.org/redirect/2
302 http://httpbin.org/redirect/1
>>> print(response.status_code, response.url)
200 http://httpbin.org/get
This is answering a slightly different question, but since I got stuck on this myself, I hope it might be useful for someone else.
If you want to use allow_redirects=False
and get directly to the first redirect object, rather than following a chain of them, and you just want to get the redirect location directly out of the 302 response object, then r.url
won't work. Instead, it's the "Location" header:
r = requests.get('http://github.com/', allow_redirects=False)
r.status_code # 302
r.url # http://github.com, not https.
r.headers['Location'] # https://github.com/ -- the redirect destination
the documentation has this blurb https://requests.readthedocs.io/en/master/user/quickstart/#redirection-and-history
import requests
r = requests.get('http://www.github.com')
r.url
#returns https://www.github.com instead of the http page you asked for
I think requests.head
instead of requests.get
will be more safe to call when handling url redirect. Check a GitHub issue here:
r = requests.head(url, allow_redirects=True)
print(r.url)
For python3.5, you can use the following code:
import urllib.request
res = urllib.request.urlopen(starturl)
finalurl = res.geturl()
print(finalurl)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With