Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logging into SAML/Shibboleth authenticated server using python

I'm trying to login my university's server via python, but I'm entirely unsure of how to go about generating the appropriate HTTP POSTs, creating the keys and certificates, and other parts of the process I may be unfamiliar with that are required to comply with the SAML spec. I can login with my browser just fine, but I'd like to be able to login and access other contents within the server using python.

For reference, here is the site

I've tried logging in by using mechanize (selecting the form, populating the fields, clicking the submit button control via mechanize.Broswer.submit(), etc.) to no avail; the login site gets spat back each time.

At this point, I'm open to implementing a solution in whichever language is most suitable to the task. Basically, I want to programatically login to SAML authenticated server.

like image 221
David Perlaza Avatar asked May 12 '13 23:05

David Perlaza


People also ask

Is shibboleth the same as SAML?

Shibboleth is a web-based Single Sign-On infrastructure. It is based on SAML, a standard for the exchange of authentication data.

How do I authenticate using SAML?

SAML uses a claims-based authentication workflow. First, when a user tries to access a site, the service provider asks the identity provider to authenticate the user. Then, the service provider uses the SAML assertion issued by the identity provider to grant the user access.

How do I fix authentication failed on SAML?

SAML errors usually occur when there's missing or incorrect information entered during your SAML setup. You can resolve most of these issues from your IDP settings, but for some, you'll need to update your SSO settings in Slack as well.


9 Answers

Basically what you have to understand is the workflow behind a SAML authentication process. Unfortunately, there is no PDF out there which seems to really provide a good help in finding out what kind of things the browser does when accessing to a SAML protected website.

Maybe you should take a look to something like this: http://www.docstoc.com/docs/33849977/Workflow-to-Use-Shibboleth-Authentication-to-Sign and obviously to this: http://en.wikipedia.org/wiki/Security_Assertion_Markup_Language. In particular, focus your attention to this scheme:

enter image description here

What I did when I was trying to understand SAML way of working, since documentation was so poor, was writing down (yes! writing - on the paper) all the steps the browser was doing from the first to the last. I used Opera, setting it in order to not allow automatic redirects (300, 301, 302 response code, and so on), and also not enabling Javascript. Then I wrote down all the cookies the server was sending me, what was doing what, and for what reason.

Maybe it was way too much effort, but in this way I was able to write a library, in Java, which is suited for the job, and incredibily fast and efficient too. Maybe someday I will release it public...

What you should understand is that, in a SAML login, there are two actors playing: the IDP (identity provider), and the SP (service provider).

A. FIRST STEP: the user agent request the resource to the SP

I'm quite sure that you reached the link you reference in your question from another page clicking to something like "Access to the protected website". If you make some more attention, you'll notice that the link you followed is not the one in which the authentication form is displayed. That's because the clicking of the link from the IDP to the SP is a step for the SAML. The first step, actally. It allows the IDP to define who are you, and why you are trying to access its resource. So, basically what you'll need to do is making a request to the link you followed in order to reach the web form, and getting the cookies it'll set. What you won't see is a SAMLRequest string, encoded into the 302 redirect you will find behind the link, sent to the IDP making the connection.

I think that it's the reason why you can't mechanize the whole process. You simply connected to the form, with no identity identification done!

B. SECOND STEP: filling the form, and submitting it

This one is easy. Please be careful! The cookies that are now set are not the same of the cookies above. You're now connecting to a utterly different website. That's the reason why SAML is used: different website, same credentials. So you may want to store these authentication cookies, provided by a successful login, to a different variable. The IDP now is going to send back you a response (after the SAMLRequest): the SAMLResponse. You have to detect it getting the source code of the webpage to which the login ends. In fact, this page is a big form containing the response, with some code in JS which automatically subits it, when the page loads. You have to get the source code of the page, parse it getting rid of all the HTML unuseful stuff, and getting the SAMLResponse (encrypted).

C. THIRD STEP: sending back the response to the SP

Now you're ready to end the procedure. You have to send (via POST, since you're emulating a form) the SAMLResponse got in the previous step, to the SP. In this way, it will provide the cookies needed to access to the protected stuff you want to access.

Aaaaand, you're done!

Again, I think that the most precious thing you'll have to do is using Opera and analyzing ALL the redirects SAML does. Then, replicate them in your code. It's not that difficult, just keep in mind that the IDP is utterly different than the SP.

like image 71
Gian Segato Avatar answered Oct 05 '22 16:10

Gian Segato


Mechanize can do the work as well except it doesn't handle Javascript. Authentification successfully worked but once on the homepage, I couldn't load such link:

<a href="#" id="formMenu:linknotes1"
   onclick="return oamSubmitForm('formMenu','formMenu:linknotes1');">

In case you need Javascript, better use Selenium with PhantomJS. Otherwise, I hope you will find inspiration from this script:

#!/usr/bin/env python
#coding: utf8
import sys, logging
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text

br = mechanize.Browser() # Browser
cj = cookielib.LWPCookieJar() # Cookie Jar
br.set_cookiejar(cj) 

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# User-Agent
br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36')]

br.open('https://ent.unr-runn.fr/uPortal/')
br.select_form(nr=0)
br.submit()

br.select_form(nr=0)
br.form['username'] = 'myusername'
br.form['password'] = 'mypassword'
br.submit()

br.select_form(nr=0)
br.submit()

rs = br.open('https://ent.unr-runn.fr/uPortal/f/u1240l1s214/p/esup-mondossierweb.u1240l1n228/max/render.uP?pP_org.apache.myfaces.portlet.MyFacesGenericPortlet.VIEW_ID=%2Fstylesheets%2Fetu%2Fdetailnotes.xhtml')

# Eventually comparing the cookies with those on Live HTTP Header: 
print "Cookies:"
for cookie in cj:
    print cookie

# Displaying page information
print rs.read()
print rs.geturl()
print rs.info();

# And that last line didn't work
rs = br.follow_link(id="formMenu:linknotes1", nr=0)
like image 31
Stéphane Bruckert Avatar answered Oct 05 '22 16:10

Stéphane Bruckert


Extending the answer from Stéphane Bruckert above, once you have used Selenium to get the auth cookies, you can still switch to requests if you want to:

import requests
cook = {i['name']: i['value'] for i in driver.get_cookies()}
driver.quit()
r = requests.get("https://protected.ac.uk", cookies=cook)
like image 42
bjw Avatar answered Oct 05 '22 18:10

bjw


You can find here a more detailed description of the Shibboleth authentication process.

like image 35
andrebask Avatar answered Oct 05 '22 17:10

andrebask


I wrote a simple Python script capable of logging into a Shibbolized page.

First, I used Live HTTP Headers in Firefox to watch the redirects for the particular Shibbolized page I was targeting.

Then I wrote a simple script using urllib.request (in Python 3.4, but the urllib2 in Python 2.x seems to have the same functionality). I found that the default redirect following of urllib.request worked for my purposes, however I found it nice to subclass the urllib.request.HTTPRedirectHandler and in this subclass (class ShibRedirectHandler) add a handler for all the http_error_302 events.

In this subclass I just printed out values of the parameters (for debugging purposes); please note that in order to utilize the default redirect following, you need to end the handler with return HTTPRedirectHandler.http_error_302(self, args...) (i.e. a call to the base class http_errror_302 handler.)

The most important component to make urllib work with Shibbolized Authentication is to create OpenerDirector which has Cookie handling added. You build the OpenerDirector with the following:

cookieprocessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(ShibRedirectHandler, cookieprocessor)
response = opener.open("https://shib.page.org")

Here is a full script that may get your started (you will need to change a few mock URLs I provided and also enter valid username and password). This uses Python 3 classes; to make this work in Python2 replace urllib.request with urllib2 and urlib.parse with urlparse:

import urllib.request
import urllib.parse

#Subclass of HTTPRedirectHandler. Does not do much, but is very
#verbose. prints out all the redirects. Compaire with what you see
#from looking at your browsers redirects (using live HTTP Headers or similar)
class ShibRedirectHandler (urllib.request.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print (req)
        print (fp.geturl())
        print (code)
        print (msg)
        print (headers)
        #without this return (passing parameters onto baseclass) 
        #redirect following will not happen automatically for you.
        return urllib.request.HTTPRedirectHandler.http_error_302(self,
                                                          req,
                                                          fp,
                                                          code,
                                                          msg,
                                                          headers)

cookieprocessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(ShibRedirectHandler, cookieprocessor)

#Edit: should be the URL of the site/page you want to load that is protected with Shibboleth
(opener.open("https://shibbolized.site.example").read())

#Inspect the page source of the Shibboleth login form; find the input names for the username
#and password, and edit according to the dictionary keys here to match your input names
loginData = urllib.parse.urlencode({'username':'<your-username>', 'password':'<your-password>'})
bLoginData = loginData.encode('ascii')

#By looking at the source of your Shib login form, find the URL the form action posts back to
#hard code this URL in the mock URL presented below.
#Make sure you include the URL, port number and path
response = opener.open("https://test-idp.server.example", bLoginData)
#See what you got.
print (response.read())
like image 45
chladni Avatar answered Oct 05 '22 17:10

chladni


Though already answered , hopefully this helps someone.I had a task of downloading files from an SAML Website and got help from Stéphane Bruckert's answer.

If headless is used then the wait time would need to be specified at the required intervals of redirection for login. Once the browser logged in I used the cookies from that and used it with the requests module to download the file - Got help from this.

This is how my code looks like-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options  #imports

things_to_download= [a,b,c,d,e,f]     #The values changing in the url
options = Options()
options.headless = False
driver = webdriver.Chrome('D:/chromedriver.exe', options=options)
driver.get('https://website.to.downloadfrom.com/')
driver.find_element_by_id('username').send_keys("Your_username") #the ID would be different for different website/forms
driver.find_element_by_id('password').send_keys("Your_password")
driver.find_element_by_id('logOnForm').submit()
session = requests.Session()
cookies = driver.get_cookies()
for things in things_to_download:    
    for cookie in cookies: 
        session.cookies.set(cookie['name'], cookie['value'])
    response = session.get('https://website.to.downloadfrom.com/bla/blabla/' + str(things_to_download))
    with open('Downloaded_stuff/'+str(things_to_download)+'.pdf', 'wb') as f:
        f.write(response.content)            # saving the file
driver.close()
like image 21
TheBroda Avatar answered Oct 05 '22 16:10

TheBroda


I wrote this code following the accepted answer. This worked for me in two separate projects

import mechanize
from bs4 import BeautifulSoup
import urllib2
import cookielib


cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_cookiejar(cj)

br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_refresh(False)
br.set_handle_referer(True)
br.set_handle_robots(False)

br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]


br.open("The URL goes here")

br.select_form(nr=0)

br.form['username'] = 'Login Username'
br.form['password'] = 'Login Password'
br.submit()

br.select_form(nr=0)
br.submit()

response = br.response().read()
print response
like image 40
draysams Avatar answered Oct 05 '22 18:10

draysams


I faced a similar problem with my university page SAML authentication as well.

The base idea is to use a requests.session object to automatically handle most of the http redirects and cookie storing. However, there were many redirects using both javascript as well, and this caused multiple problems using the simple requests solution.

I ended up using fiddler to keep track of every request my browser made to the university server to fill up the redirects I've missed. It really made the process easier.

My solution is far from ideal, but seems to work.

like image 42
Arthur.V Avatar answered Oct 05 '22 16:10

Arthur.V


If all else fails, I'd suggest using Selenium's webdriver in 'headfull' mode (i.e. a browser window will open, allowing one to input the username, password, and any other necessary login info), which would allow easy access the target website even if your form is more complex than the standard 'username' and 'password' duo and you're unsure how to fill in the br.form sections mentioned in the other answers.

from selenium import webdriver
import time

DRIVER_PATH = r'C:/INSERT_YOUR_PATH_HERE/chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get('https://moodle.tau.ac.il/login/index.php') # This is the login screen

Once you do so, you can create a loop which checks if you've reached your destination URL - if so, you're in! This snippet of code worked for me; My goal was to access my university's coursework website Moodle and download all of the PDFs automatically.

targetUrl = False
timeElapsed = 0

def downloadAllPDFs():         # Or any other function you'd like, the point is that 
    print("Access Granted!")   # you now have access to the HTML. 

while not targetUrl and timeElapsed < 60:
    time.sleep(1)
    timeElapsed += 1
    if driver.current_url == r"https://moodle.tau.ac.il/my/": # The site you're trying to login to.
        downloadAllPDFs()
        targetUrl = True 
like image 24
Yoni Friedman Avatar answered Oct 05 '22 17:10

Yoni Friedman