How do I prevent Python's urllib(2) from following a redirect

Tags:

I am currently trying to log into a site using Python however the site seems to be sending a cookie and a redirect statement on the same page. Python seems to be following that redirect thus preventing me from reading the cookie send by the login page. How do I prevent Python's urllib (or urllib2) urlopen from following the redirect?

243

asked Feb 16 '09 20:02

Jack Edmonds

3 Answers

You could do a couple of things:

Build your own HTTPRedirectHandler that intercepts each redirect
Create an instance of HTTPCookieProcessor and install that opener so that you have access to the cookiejar.

This is a quick little thing that shows both

import urllib2

#redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Cookie Manip Right Here"
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

    http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response =urllib2.urlopen("WHEREEVER")
print response.read()

print cookieprocessor.cookiejar

192

answered Oct 13 '22 22:10

pope

If all you need is stopping redirection, then there is a simple way to do it. For example I only want to get cookies and for a better performance I don't want to be redirected to any other page. Also I hope the code is kept as 3xx. let's use 302 for instance.

class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        code, msg, hdrs = response.code, response.msg, response.info()

        # only add this line to stop 302 redirection.
        if code == 302: return response

        if not (200 <= code < 300):
            response = self.parent.error(
                'http', request, response, code, msg, hdrs)
        return response

    https_response = http_response

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)

In this way, you don't even need to go into urllib2.HTTPRedirectHandler.http_error_302()

Yet more common case is that we simply want to stop redirection (as required):

class NoRedirection(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        return response

    https_response = http_response

And normally use it this way:

cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
    redirection_target = response.headers['Location']

answered Oct 13 '22 22:10

Alan Duan

urllib2.urlopen calls build_opener() which uses this list of handler classes:

handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]

You could try calling urllib2.build_opener(handlers) yourself with a list that omits HTTPRedirectHandler, then call the open() method on the result to open your URL. If you really dislike redirects, you could even call urllib2.install_opener(opener) to your own non-redirecting opener.

It sounds like your real problem is that urllib2 isn't doing cookies the way you'd like. See also How to use Python to login to a webpage and retrieve cookies for later usage?

answered Oct 13 '22 23:10

joeforker

Related questions
                            
                                Redirect subprocess stderr to stdout
                            
                                What does print()'s `flush` do?
                            
                                Python string argument without an encoding
                            
                                What does Keras.io.preprocessing.sequence.pad_sequences do?
                            
                                How to square or raise to a power (elementwise) a 2D numpy array?
                            
                                Easy way to check that a variable is defined in python? [duplicate]
                            
                                Non-global middleware in Django
                            
                                Algorithm to find which number in a list sum up to a certain number
                            
                                Store and reload matplotlib.pyplot object
                            
                                Seaborn: How to add vertical lines to a distribution plot (sns.distplot)
                            
                                mod_wsgi, mod_python, or just cgi?
                            
                                My rst README is not formatted on pypi.python.org
                            
                                Sklearn SGDClassifier partial fit
                            
                                How to give jupyter cell standard input in python?
                            
                                What's the time complexity of functions in heapq library
                            
                                How to use a (random) *.otf or *.ttf font in matplotlib?
                            
                                Proper exception to raise if None encountered as argument
                            
                                Flask - ImportError: No module named app
                            
                                How can I write unit tests against code that uses matplotlib?
                            
                                Spark RDD to DataFrame python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I prevent Python's urllib(2) from following a redirect

Tags:

python

urllib2