Use Python urllib Library To Get Redirection URL. request module. Define a web page URL, suppose this URL will be redirected when you send a request to it. Get the response object. Get the webserver returned response status code, if the code is 301 then it means the URL has been redirected permanently.
We use requests. get() method since we are sending a GET request. The two arguments we pass are url and the parameters dictionary. Now, in order to retrieve the data from the response object, we need to convert the raw response content into a JSON type data structure.
Here is the Requests way:
import requests
r = requests.get('http://github.com', allow_redirects=False)
print(r.status_code, r.headers['Location'])
Dive Into Python has a good chapter on handling redirects with urllib2. Another solution is httplib.
>>> import httplib
>>> conn = httplib.HTTPConnection("www.bogosoft.com")
>>> conn.request("GET", "")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
301 Moved Permanently
>>> print r1.getheader('Location')
http://www.bogosoft.com/new/location
This is a urllib2 handler that will not follow redirects:
class NoRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
infourl = urllib.addinfourl(fp, headers, req.get_full_url())
infourl.status = code
infourl.code = code
return infourl
http_error_300 = http_error_302
http_error_301 = http_error_302
http_error_303 = http_error_302
http_error_307 = http_error_302
opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)
The redirections
keyword in the httplib2
request method is a red herring. Rather than return the first request it will raise a RedirectLimit
exception if it receives a redirection status code. To return the inital response you need to set follow_redirects
to False
on the Http
object:
import httplib2
h = httplib2.Http()
h.follow_redirects = False
(response, body) = h.request("http://example.com")
i suppose this would help
from httplib2 import Http
def get_html(uri,num_redirections=0): # put it as 0 for not to follow redirects
conn = Http()
return conn.request(uri,redirections=num_redirections)
I second olt's pointer to Dive into Python. Here's an implementation using urllib2 redirect handlers, more work than it should be? Maybe, shrug.
import sys
import urllib2
class RedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_301(self, req, fp, code, msg, headers):
result = urllib2.HTTPRedirectHandler.http_error_301(
self, req, fp, code, msg, headers)
result.status = code
raise Exception("Permanent Redirect: %s" % 301)
def http_error_302(self, req, fp, code, msg, headers):
result = urllib2.HTTPRedirectHandler.http_error_302(
self, req, fp, code, msg, headers)
result.status = code
raise Exception("Temporary Redirect: %s" % 302)
def main(script_name, url):
opener = urllib2.build_opener(RedirectHandler)
urllib2.install_opener(opener)
print urllib2.urlopen(url).read()
if __name__ == "__main__":
main(*sys.argv)
The shortest way however is
class NoRedirect(urllib2.HTTPRedirectHandler):
def redirect_request(self, req, fp, code, msg, hdrs, newurl):
pass
noredir_opener = urllib2.build_opener(NoRedirect())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With