I want to make a HEAD request without any content data to conserve bandwidth. I'm using urllib.request
. However, upon testing, it appears the HEAD requests also gets the data? What's going on?
Python 3.4.2 (v3.4.2:ab2c023a9432, Oct 6 2014, 22:16:31) [MSC v.1600 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> req = urllib.request.Request("http://www.google.com", method="HEAD")
>>> resp = urllib.request.urlopen(req)
>>> a = resp.read()
>>> len(a)
24088
This function always returns an object which can work as a context manager and has the properties url, headers, and status. See urllib.
Requests - Requests' is a simple, easy-to-use HTTP library written in Python. 1) Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib. encode() to encode the parameters before passing them.
1.26. 6 (2021-06-25) Deprecated the urllib3.
The http://www.google.com
URL redirects:
$ curl -D - -X HEAD http://www.google.com
HTTP/1.1 302 Found
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Location: http://www.google.co.uk/?gfe_rd=cr&ei=A8sXVZLOGvHH8ge1jYKwDQ
Content-Length: 261
Date: Sun, 29 Mar 2015 09:50:59 GMT
Server: GFE/2.0
Alternate-Protocol: 80:quic,p=0.5
and urllib.request
has followed the redirect, issuing a GET request to that new location:
>>> import urllib.request
>>> req = urllib.request.Request("http://www.google.com", method="HEAD")
>>> resp = urllib.request.urlopen(req)
>>> resp.url
'http://www.google.co.uk/?gfe_rd=cr&ei=ucoXVdfaJOTH8gf-voKwBw'
You'd have to build your own handler stack to prevent this; the HTTPRedirectHandler
isn't smart enough to not handle a redirect when issuing a HEAD
method action. Adapting the example from Alan Duan from How do I prevent Python's urllib(2) from following a redirect to Python 3 would give you:
import urllib.request
class NoRedirection(urllib.request.HTTPErrorProcessor):
def http_response(self, request, response):
return response
https_response = http_response
opener = urllib.request.build_opener(NoRedirection)
req = urllib.request.Request("http://www.google.com", method="HEAD")
resp = opener.open(req)
You'd be better of using the requests
library; it explicitly sets allow_redirects=False
when using the requests.head()
or requests.Session().head()
callables, so there you can see the original result:
>>> import requests
>>> requests.head('http://www.google.com')
<Response [302]>
>>> _.headers['Location']
'http://www.google.co.uk/?gfe_rd=cr&ei=FcwXVbepMvHH8ge1jYKwDQ'
and even if redirection is enabled the response.history
list gives you access to the intermediate requests, and requests
uses the correct method for the redirected call too:
>>> response = requests.head('http://www.google.com', allow_redirects=True)
>>> response.url
'http://www.google.co.uk/?gfe_rd=cr&ei=8e0XVYfGMubH8gfJnoKoDQ'
>>> response.history
[<Response [302]>]
>>> response.history[0].url
'http://www.google.com/'
>>> response.request.method
'HEAD'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With