everybody. I'm working on a django/mod_wsgi/apache2 website that serves sensitive information using https for all requests and responses. All views are written to redirect if the user isn't authenticated. It also has several views that are meant to function like RESTful web services.
I'm now in the process of writing a script that uses urllib/urllib2 to contact several of these services in order to download a series of very large files. I'm running into problems with 403: FORBIDDEN errors when attempting to log in.
The (rough-draft) method I'm using for authentication and log in is:
def login( base_address, username=None, password=None ):
# prompt for the username (if needed), password
if username == None:
username = raw_input( 'Username: ' )
if password == None:
password = getpass.getpass( 'Password: ' )
log.info( 'Logging in %s' % username )
# fetch the login page in order to get the csrf token
cookieHandler = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener( urllib2.HTTPSHandler(), cookieHandler )
urllib2.install_opener( opener )
login_url = base_address + PATH_TO_LOGIN
log.debug( "login_url: " + login_url )
login_page = opener.open( login_url )
# attempt to get the csrf token from the cookie jar
csrf_cookie = None
for cookie in cookieHandler.cookiejar:
if cookie.name == 'csrftoken':
csrf_cookie = cookie
break
if not cookie:
raise IOError( "No csrf cookie found" )
log.debug( "found csrf cookie: " + str( csrf_cookie ) )
log.debug( "csrf_token = %s" % csrf_cookie.value )
# login using the usr, pwd, and csrf token
login_data = urllib.urlencode( dict(
username=username, password=password,
csrfmiddlewaretoken=csrf_cookie.value ) )
log.debug( "login_data: %s" % login_data )
req = urllib2.Request( login_url, login_data )
response = urllib2.urlopen( req )
# <--- 403: FORBIDDEN here
log.debug( 'response url:\n' + str( response.geturl() ) + '\n' )
log.debug( 'response info:\n' + str( response.info() ) + '\n' )
# should redirect to the welcome page here, if back at log in - refused
if response.geturl() == login_url:
raise IOError( 'Authentication refused' )
log.info( '\t%s is logged in' % username )
# save the cookies/opener for further actions
return opener
I'm using the HTTPCookieHandler to store Django's authentication cookies on the script-side so I can access the web services and get through my redirects.
I know that the CSRFmiddleware for Django is going to bump me out if I don't pass the csrf token along with the log in information, so I pull that first from the first page/form load's cookiejar. Like I mentioned, this works with the http/development version of the site.
Specifically, I'm getting a 403 when trying to post the credentials to the login page/form over the https connection. This method works when used on the development server which uses an http connection.
There is no Apache directory directive that prevents access to that area (that I can see). The script connects successfully to the login page without post data so I'm thinking that would leave Apache out of the problem (but I could be wrong).
The python installations I'm using are both compiled with SSL.
I've also read that urllib2 doesn't allow https connections via proxy. I'm not very experienced with proxies, so I don't know if using a script from a remote machine is actually a proxy connection and whether that would be the problem. Is this causing the access problem?
From what I can tell, the problem is in the combination of cookies and the post data, but I'm unclear as to where to take it from here.
Any help would be appreciated. Thanks
Please excuse my answering my own question, but - for the record this seems to have solved it:
It turns out I needed to set the HTTP Referer header to the login page url in the request where I post the login information.
req.add_header( 'Referer', login_url )
The reason is explained on the Django CSRF documentation - specifically, step 4.
Due to our somewhat peculiar server setup where we use HTTPS on the production side and DEBUG=False, I wasn't seeing the csrf_failure reason for failure (in this case: 'Referer checking failed - no referer') that is normally output in the DEBUG info. I ended up printing that failure reason to the Apache error_log and STFW'd on it. That lead me to code.djangoproject/.../csrf.py and the Referer header fix.
This works on my django setup on https which is inspired by yours. I'm starting to think that the problem is outside this code... Is the server saying anything? I might very well be looking into apache.
I'm using the following code from my local machine to my server using ssl on nginx, so apache might be the place to look. I suppose one way to narrow it down is to try your script on my login page :) Shoot me an email!
import urllib
import urllib2
import contextlib
def login(login_url, username, password):
"""
Login to site
"""
cookies = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(cookies)
urllib2.install_opener(opener)
opener.open(login_url)
try:
token = [x.value for x in cookies.cookiejar if x.name == 'csrftoken'][0]
except IndexError:
return False, "no csrftoken"
params = dict(username=username, password=password, \
this_is_the_login_form=True,
csrfmiddlewaretoken=token,
)
encoded_params = urllib.urlencode(params)
with contextlib.closing(opener.open(login_url, encoded_params)) as f:
html = f.read()
print html
# we're in.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With