I am currently trying to log into a site using Python however the site seems to be sending a cookie and a redirect statement on the same page. Python seems to be following that redirect thus preventing me from reading the cookie send by the login page. How do I prevent Python's urllib (or urllib2) urlopen from following the redirect?
Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols. Urllib is a package that collects several modules for working with URLs, such as: urllib.
True, if you want to avoid adding any dependencies, urllib is available. But note that even the Python official documentation recommends the requests library: "The Requests package is recommended for a higher-level HTTP client interface."
The urllib. request module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more. The Requests package is recommended for a higher-level HTTP client interface.
You could do a couple of things:
This is a quick little thing that shows both
import urllib2
#redirect_handler = urllib2.HTTPRedirectHandler()
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
print "Cookie Manip Right Here"
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
cookieprocessor = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)
response =urllib2.urlopen("WHEREEVER")
print response.read()
print cookieprocessor.cookiejar
If all you need is stopping redirection, then there is a simple way to do it. For example I only want to get cookies and for a better performance I don't want to be redirected to any other page. Also I hope the code is kept as 3xx. let's use 302 for instance.
class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
code, msg, hdrs = response.code, response.msg, response.info()
# only add this line to stop 302 redirection.
if code == 302: return response
if not (200 <= code < 300):
response = self.parent.error(
'http', request, response, code, msg, hdrs)
return response
https_response = http_response
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)
In this way, you don't even need to go into urllib2.HTTPRedirectHandler.http_error_302()
Yet more common case is that we simply want to stop redirection (as required):
class NoRedirection(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
return response
https_response = http_response
And normally use it this way:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
redirection_target = response.headers['Location']
urllib2.urlopen
calls build_opener()
which uses this list of handler classes:
handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]
You could try calling urllib2.build_opener(handlers)
yourself with a list that omits HTTPRedirectHandler
, then call the open()
method on the result to open your URL. If you really dislike redirects, you could even call urllib2.install_opener(opener)
to your own non-redirecting opener.
It sounds like your real problem is that urllib2
isn't doing cookies the way you'd like. See also How to use Python to login to a webpage and retrieve cookies for later usage?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With