Why am I able to read a HEAD http request in python 3 urllib.request?

Tags:

I want to make a HEAD request without any content data to conserve bandwidth. I'm using urllib.request. However, upon testing, it appears the HEAD requests also gets the data? What's going on?

Python 3.4.2 (v3.4.2:ab2c023a9432, Oct  6 2014, 22:16:31) [MSC v.1600 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> req = urllib.request.Request("http://www.google.com", method="HEAD")
>>> resp = urllib.request.urlopen(req)
>>> a = resp.read()
>>> len(a)
24088

393

asked Mar 29 '15 09:03

Eric

1 Answers

The http://www.google.com URL redirects:

$ curl -D - -X HEAD http://www.google.com
HTTP/1.1 302 Found
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Location: http://www.google.co.uk/?gfe_rd=cr&ei=A8sXVZLOGvHH8ge1jYKwDQ
Content-Length: 261
Date: Sun, 29 Mar 2015 09:50:59 GMT
Server: GFE/2.0
Alternate-Protocol: 80:quic,p=0.5

and urllib.request has followed the redirect, issuing a GET request to that new location:

>>> import urllib.request
>>> req = urllib.request.Request("http://www.google.com", method="HEAD")
>>> resp = urllib.request.urlopen(req)
>>> resp.url
'http://www.google.co.uk/?gfe_rd=cr&ei=ucoXVdfaJOTH8gf-voKwBw'

You'd have to build your own handler stack to prevent this; the HTTPRedirectHandler isn't smart enough to not handle a redirect when issuing a HEAD method action. Adapting the example from Alan Duan from How do I prevent Python's urllib(2) from following a redirect to Python 3 would give you:

import urllib.request

class NoRedirection(urllib.request.HTTPErrorProcessor):
    def http_response(self, request, response):
        return response
    https_response = http_response

opener = urllib.request.build_opener(NoRedirection)

req = urllib.request.Request("http://www.google.com", method="HEAD")
resp = opener.open(req)

You'd be better of using the requests library; it explicitly sets allow_redirects=False when using the requests.head() or requests.Session().head() callables, so there you can see the original result:

>>> import requests
>>> requests.head('http://www.google.com')
<Response [302]>
>>> _.headers['Location']
'http://www.google.co.uk/?gfe_rd=cr&ei=FcwXVbepMvHH8ge1jYKwDQ'

and even if redirection is enabled the response.history list gives you access to the intermediate requests, and requests uses the correct method for the redirected call too:

>>> response = requests.head('http://www.google.com', allow_redirects=True)
>>> response.url
'http://www.google.co.uk/?gfe_rd=cr&ei=8e0XVYfGMubH8gfJnoKoDQ'
>>> response.history
[<Response [302]>]
>>> response.history[0].url
'http://www.google.com/'
>>> response.request.method
'HEAD'

147

answered Oct 13 '22 00:10

Martijn Pieters

Related questions
                            
                                Efficiently generating random graphs with a user-specified global clustering coefficient
                            
                                Interpolation with Delaunay Triangulation
                            
                                ImageMagick wand not recognizing pdf image?
                            
                                "ValueError: cannot reindex from a duplicate axis"
                            
                                Multiple font sizes in same Matplotlib label
                            
                                regular expressions : (.*), (.*?) and .* [duplicate]
                            
                                Memory leak in Python extension when array is created with PyArray_SimpleNewFromData() and returned
                            
                                Running script without virtualenv activation
                            
                                Subtract two DataFrames with non overlapping indexes
                            
                                Very slow interpolation using `scipy.interpolate.griddata`
                            
                                Can a python script access variables defined in an interactive session?
                            
                                How to open and read LZMA file in-memory
                            
                                Pass argument to scrapy spider within a python script
                            
                                Can I use python 're' to parse complex human names?
                            
                                numpy array casting ruled not 'safe'
                            
                                Force https with HyperlinkedModelSerializer?
                            
                                How to set formatting for entire row or column in xlsxwriter Python?
                            
                                How to iterate over time periods in pandas
                            
                                'selenium.common.exceptions.WebDriverException: Message: u'chrome not reachable
                            
                                including more than one list of arguments with docopt

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why am I able to read a HEAD http request in python 3 urllib.request?

Tags:

python

httprequest

urllib

python-3.4

Eric

People also ask

1 Answers

Martijn Pieters

Recent Activity

Donate For Us