I can use urllib2 to make HEAD requests like so:
import urllib2
request = urllib2.Request('http://example.com')
request.get_method = lambda: 'HEAD'
urllib2.urlopen(request)
The problem is that it appears that when this follows redirects, it uses GET instead of HEAD.
The purpose of this HEAD request is to check the size and content type of the URL I'm about to download so that I can ensure that I don't download some huge document. (The URL is supplied by a random internet user through IRC).
How could I make it use HEAD requests when following redirects?
request module. Define a web page URL, suppose this URL will be redirected when you send a request to it. Get the response object. Get the webserver returned response status code, if the code is 301 then it means the URL has been redirected permanently.
So if allow_redirects is True , the redirects have been followed and the final response returned is the final page after following redirects. If allow_redirects is False , the first response is returned, even if it is a redirect. Follow this answer to receive notifications.
You can do this with the requests library:
>>> import requests
>>> r = requests.head('http://github.com', allow_redirects=True)
>>> r
<Response [200]>
>>> r.history
[<Response [301]>]
>>> r.url
u'https://github.com/'
Good question! If you're set on using urllib2
, you'll want to look at this answer about the construction of your own redirect handler.
In short (read: blatantly stolen from the previous answer):
import urllib2
#redirect_handler = urllib2.HTTPRedirectHandler()
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
print "Cookie Manip Right Here"
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
cookieprocessor = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)
response =urllib2.urlopen("WHEREEVER")
print response.read()
print cookieprocessor.cookiejar
Also, as mentioned in the errata, you can use Python Requests.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With