I'm trying to login my university's server via python, but I'm entirely unsure of how to go about generating the appropriate HTTP POSTs, creating the keys and certificates, and other parts of the process I may be unfamiliar with that are required to comply with the SAML spec. I can login with my browser just fine, but I'd like to be able to login and access other contents within the server using python. For reference, here is the site I've tried logging in by using mechanize (selecting the form, populating the fields, clicking the submit button control via mechanize.Broswer.submit(), etc.) to no avail; the login site gets spat back each time. At this point, I'm open to implementing a solution in whichever language is most suitable to the task. Basically, I want to programatically login to SAML authenticated server.

Mechanize can do the work as well except it doesn't handle Javascript. Authentification successfully worked but once on the homepage, I couldn't load such link: <pre class="prettyprint"><code><a href="#" id="formMenu:linknotes1" onclick="return oamSubmitForm('formMenu','formMenu:linknotes1');"> </code></pre> In case you need Javascript, better use Selenium with PhantomJS. Otherwise, I hope you will find inspiration from this script: <pre class="prettyprint"><code>#!/usr/bin/env python #coding: utf8 import sys, logging import mechanize import cookielib from BeautifulSoup import BeautifulSoup import html2text br = mechanize.Browser() # Browser cj = cookielib.LWPCookieJar() # Cookie Jar br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_gzip(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) # Follows refresh 0 but not hangs on refresh > 0 br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) # User-Agent br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36')] br.open('https://ent.unr-runn.fr/uPortal/') br.select_form(nr=0) br.submit() br.select_form(nr=0) br.form['username'] = 'myusername' br.form['password'] = 'mypassword' br.submit() br.select_form(nr=0) br.submit() rs = br.open('https://ent.unr-runn.fr/uPortal/f/u1240l1s214/p/esup-mondossierweb.u1240l1n228/max/render.uP?pP_org.apache.myfaces.portlet.MyFacesGenericPortlet.VIEW_ID=%2Fstylesheets%2Fetu%2Fdetailnotes.xhtml') # Eventually comparing the cookies with those on Live HTTP Header: print "Cookies:" for cookie in cj: print cookie # Displaying page information print rs.read() print rs.geturl() print rs.info(); # And that last line didn't work rs = br.follow_link(id="formMenu:linknotes1", nr=0) </code></pre>

Extending the answer from Stéphane Bruckert above, once you have used Selenium to get the auth cookies, you can still switch to requests if you want to: <pre class="prettyprint"><code>import requests cook = {i['name']: i['value'] for i in driver.get_cookies()} driver.quit() r = requests.get("https://protected.ac.uk", cookies=cook) </code></pre>

If all else fails, I'd suggest using Selenium's webdriver in 'headfull' mode (i.e. a browser window will open, allowing one to input the username, password, and any other necessary login info), which would allow easy access the target website even if your form is more complex than the standard 'username' and 'password' duo and you're unsure how to fill in the br.form sections mentioned in the other answers. <pre class="prettyprint"><code>from selenium import webdriver import time DRIVER_PATH = r'C:/INSERT_YOUR_PATH_HERE/chromedriver.exe' driver = webdriver.Chrome(executable_path=DRIVER_PATH) driver.get('https://moodle.tau.ac.il/login/index.php') # This is the login screen </code></pre> Once you do so, you can create a loop which checks if you've reached your destination URL - if so, you're in! This snippet of code worked for me; My goal was to access my university's coursework website Moodle and download all of the PDFs automatically. <pre class="prettyprint"><code>targetUrl = False timeElapsed = 0 def downloadAllPDFs(): # Or any other function you'd like, the point is that print("Access Granted!") # you now have access to the HTML. while not targetUrl and timeElapsed < 60: time.sleep(1) timeElapsed += 1 if driver.current_url == r"https://moodle.tau.ac.il/my/": # The site you're trying to login to. downloadAllPDFs() targetUrl = True </code></pre>

Logging into SAML/Shibboleth authenticated server using python

Tags:

python

authentication

login

saml

saml-2.0

shibboleth

I'm trying to login my university's server via python, but I'm entirely unsure of how to go about generating the appropriate HTTP POSTs, creating the keys and certificates, and other parts of the process I may be unfamiliar with that are required to comply with the SAML spec. I can login with my browser just fine, but I'd like to be able to login and access other contents within the server using python.

For reference, here is the site

I've tried logging in by using mechanize (selecting the form, populating the fields, clicking the submit button control via mechanize.Broswer.submit(), etc.) to no avail; the login site gets spat back each time.

At this point, I'm open to implementing a solution in whichever language is most suitable to the task. Basically, I want to programatically login to SAML authenticated server.

221

asked May 12 '13 23:05

David Perlaza

9 Answers

Basically what you have to understand is the workflow behind a SAML authentication process. Unfortunately, there is no PDF out there which seems to really provide a good help in finding out what kind of things the browser does when accessing to a SAML protected website.

Maybe you should take a look to something like this: http://www.docstoc.com/docs/33849977/Workflow-to-Use-Shibboleth-Authentication-to-Sign and obviously to this: http://en.wikipedia.org/wiki/Security_Assertion_Markup_Language. In particular, focus your attention to this scheme:

enter image description here

What I did when I was trying to understand SAML way of working, since documentation was so poor, was writing down (yes! writing - on the paper) all the steps the browser was doing from the first to the last. I used Opera, setting it in order to not allow automatic redirects (300, 301, 302 response code, and so on), and also not enabling Javascript. Then I wrote down all the cookies the server was sending me, what was doing what, and for what reason.

Maybe it was way too much effort, but in this way I was able to write a library, in Java, which is suited for the job, and incredibily fast and efficient too. Maybe someday I will release it public...

What you should understand is that, in a SAML login, there are two actors playing: the IDP (identity provider), and the SP (service provider).

A. FIRST STEP: the user agent request the resource to the SP

I'm quite sure that you reached the link you reference in your question from another page clicking to something like "Access to the protected website". If you make some more attention, you'll notice that the link you followed is not the one in which the authentication form is displayed. That's because the clicking of the link from the IDP to the SP is a step for the SAML. The first step, actally. It allows the IDP to define who are you, and why you are trying to access its resource. So, basically what you'll need to do is making a request to the link you followed in order to reach the web form, and getting the cookies it'll set. What you won't see is a SAMLRequest string, encoded into the 302 redirect you will find behind the link, sent to the IDP making the connection.

I think that it's the reason why you can't mechanize the whole process. You simply connected to the form, with no identity identification done!

B. SECOND STEP: filling the form, and submitting it

This one is easy. Please be careful! The cookies that are now set are not the same of the cookies above. You're now connecting to a utterly different website. That's the reason why SAML is used: different website, same credentials. So you may want to store these authentication cookies, provided by a successful login, to a different variable. The IDP now is going to send back you a response (after the SAMLRequest): the SAMLResponse. You have to detect it getting the source code of the webpage to which the login ends. In fact, this page is a big form containing the response, with some code in JS which automatically subits it, when the page loads. You have to get the source code of the page, parse it getting rid of all the HTML unuseful stuff, and getting the SAMLResponse (encrypted).

C. THIRD STEP: sending back the response to the SP

Now you're ready to end the procedure. You have to send (via POST, since you're emulating a form) the SAMLResponse got in the previous step, to the SP. In this way, it will provide the cookies needed to access to the protected stuff you want to access.

Aaaaand, you're done!

Again, I think that the most precious thing you'll have to do is using Opera and analyzing ALL the redirects SAML does. Then, replicate them in your code. It's not that difficult, just keep in mind that the IDP is utterly different than the SP.

answered Oct 05 '22 16:10

Gian Segato

Mechanize can do the work as well except it doesn't handle Javascript. Authentification successfully worked but once on the homepage, I couldn't load such link:

<a href="#" id="formMenu:linknotes1"
   onclick="return oamSubmitForm('formMenu','formMenu:linknotes1');">

In case you need Javascript, better use Selenium with PhantomJS. Otherwise, I hope you will find inspiration from this script:

#!/usr/bin/env python
#coding: utf8
import sys, logging
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text

br = mechanize.Browser() # Browser
cj = cookielib.LWPCookieJar() # Cookie Jar
br.set_cookiejar(cj) 

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# User-Agent
br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36')]

br.open('https://ent.unr-runn.fr/uPortal/')
br.select_form(nr=0)
br.submit()

br.select_form(nr=0)
br.form['username'] = 'myusername'
br.form['password'] = 'mypassword'
br.submit()

br.select_form(nr=0)
br.submit()

rs = br.open('https://ent.unr-runn.fr/uPortal/f/u1240l1s214/p/esup-mondossierweb.u1240l1n228/max/render.uP?pP_org.apache.myfaces.portlet.MyFacesGenericPortlet.VIEW_ID=%2Fstylesheets%2Fetu%2Fdetailnotes.xhtml')

# Eventually comparing the cookies with those on Live HTTP Header: 
print "Cookies:"
for cookie in cj:
    print cookie

# Displaying page information
print rs.read()
print rs.geturl()
print rs.info();

# And that last line didn't work
rs = br.follow_link(id="formMenu:linknotes1", nr=0)

answered Oct 05 '22 16:10

Stéphane Bruckert

Extending the answer from Stéphane Bruckert above, once you have used Selenium to get the auth cookies, you can still switch to requests if you want to:

import requests
cook = {i['name']: i['value'] for i in driver.get_cookies()}
driver.quit()
r = requests.get("https://protected.ac.uk", cookies=cook)

answered Oct 05 '22 18:10

bjw

You can find here a more detailed description of the Shibboleth authentication process.

answered Oct 05 '22 17:10

andrebask

I wrote a simple Python script capable of logging into a Shibbolized page.

First, I used Live HTTP Headers in Firefox to watch the redirects for the particular Shibbolized page I was targeting.

Then I wrote a simple script using urllib.request (in Python 3.4, but the urllib2 in Python 2.x seems to have the same functionality). I found that the default redirect following of urllib.request worked for my purposes, however I found it nice to subclass the urllib.request.HTTPRedirectHandler and in this subclass (class ShibRedirectHandler) add a handler for all the http_error_302 events.

In this subclass I just printed out values of the parameters (for debugging purposes); please note that in order to utilize the default redirect following, you need to end the handler with return HTTPRedirectHandler.http_error_302(self, args...) (i.e. a call to the base class http_errror_302 handler.)

The most important component to make urllib work with Shibbolized Authentication is to create OpenerDirector which has Cookie handling added. You build the OpenerDirector with the following:

cookieprocessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(ShibRedirectHandler, cookieprocessor)
response = opener.open("https://shib.page.org")

Here is a full script that may get your started (you will need to change a few mock URLs I provided and also enter valid username and password). This uses Python 3 classes; to make this work in Python2 replace urllib.request with urllib2 and urlib.parse with urlparse:

import urllib.request
import urllib.parse

#Subclass of HTTPRedirectHandler. Does not do much, but is very
#verbose. prints out all the redirects. Compaire with what you see
#from looking at your browsers redirects (using live HTTP Headers or similar)
class ShibRedirectHandler (urllib.request.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print (req)
        print (fp.geturl())
        print (code)
        print (msg)
        print (headers)
        #without this return (passing parameters onto baseclass) 
        #redirect following will not happen automatically for you.
        return urllib.request.HTTPRedirectHandler.http_error_302(self,
                                                          req,
                                                          fp,
                                                          code,
                                                          msg,
                                                          headers)

cookieprocessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(ShibRedirectHandler, cookieprocessor)

#Edit: should be the URL of the site/page you want to load that is protected with Shibboleth
(opener.open("https://shibbolized.site.example").read())

#Inspect the page source of the Shibboleth login form; find the input names for the username
#and password, and edit according to the dictionary keys here to match your input names
loginData = urllib.parse.urlencode({'username':'<your-username>', 'password':'<your-password>'})
bLoginData = loginData.encode('ascii')

#By looking at the source of your Shib login form, find the URL the form action posts back to
#hard code this URL in the mock URL presented below.
#Make sure you include the URL, port number and path
response = opener.open("https://test-idp.server.example", bLoginData)
#See what you got.
print (response.read())

answered Oct 05 '22 17:10

chladni

Though already answered , hopefully this helps someone.I had a task of downloading files from an SAML Website and got help from Stéphane Bruckert's answer.

If headless is used then the wait time would need to be specified at the required intervals of redirection for login. Once the browser logged in I used the cookies from that and used it with the requests module to download the file - Got help from this.

This is how my code looks like-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options  #imports

things_to_download= [a,b,c,d,e,f]     #The values changing in the url
options = Options()
options.headless = False
driver = webdriver.Chrome('D:/chromedriver.exe', options=options)
driver.get('https://website.to.downloadfrom.com/')
driver.find_element_by_id('username').send_keys("Your_username") #the ID would be different for different website/forms
driver.find_element_by_id('password').send_keys("Your_password")
driver.find_element_by_id('logOnForm').submit()
session = requests.Session()
cookies = driver.get_cookies()
for things in things_to_download:    
    for cookie in cookies: 
        session.cookies.set(cookie['name'], cookie['value'])
    response = session.get('https://website.to.downloadfrom.com/bla/blabla/' + str(things_to_download))
    with open('Downloaded_stuff/'+str(things_to_download)+'.pdf', 'wb') as f:
        f.write(response.content)            # saving the file
driver.close()

answered Oct 05 '22 16:10

TheBroda

I wrote this code following the accepted answer. This worked for me in two separate projects

import mechanize
from bs4 import BeautifulSoup
import urllib2
import cookielib


cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_cookiejar(cj)

br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_refresh(False)
br.set_handle_referer(True)
br.set_handle_robots(False)

br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]


br.open("The URL goes here")

br.select_form(nr=0)

br.form['username'] = 'Login Username'
br.form['password'] = 'Login Password'
br.submit()

br.select_form(nr=0)
br.submit()

response = br.response().read()
print response

answered Oct 05 '22 18:10

draysams

I faced a similar problem with my university page SAML authentication as well.

The base idea is to use a requests.session object to automatically handle most of the http redirects and cookie storing. However, there were many redirects using both javascript as well, and this caused multiple problems using the simple requests solution.

I ended up using fiddler to keep track of every request my browser made to the university server to fill up the redirects I've missed. It really made the process easier.

My solution is far from ideal, but seems to work.

answered Oct 05 '22 16:10

Arthur.V

If all else fails, I'd suggest using Selenium's webdriver in 'headfull' mode (i.e. a browser window will open, allowing one to input the username, password, and any other necessary login info), which would allow easy access the target website even if your form is more complex than the standard 'username' and 'password' duo and you're unsure how to fill in the br.form sections mentioned in the other answers.

from selenium import webdriver
import time

DRIVER_PATH = r'C:/INSERT_YOUR_PATH_HERE/chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get('https://moodle.tau.ac.il/login/index.php') # This is the login screen

Once you do so, you can create a loop which checks if you've reached your destination URL - if so, you're in! This snippet of code worked for me; My goal was to access my university's coursework website Moodle and download all of the PDFs automatically.

targetUrl = False
timeElapsed = 0

def downloadAllPDFs():         # Or any other function you'd like, the point is that 
    print("Access Granted!")   # you now have access to the HTML. 

while not targetUrl and timeElapsed < 60:
    time.sleep(1)
    timeElapsed += 1
    if driver.current_url == r"https://moodle.tau.ac.il/my/": # The site you're trying to login to.
        downloadAllPDFs()
        targetUrl = True

answered Oct 05 '22 17:10

Yoni Friedman

Related questions
                            
                                Read spark data with column that clashes with partition name
                            
                                DataFrame pairs of columns division
                            
                                partial tucker decomposition
                            
                                Writing a Domain Specific Language for selecting rows from a table
                            
                                Can anyone recommend a decent FOSS PDF generator for Python?
                            
                                What is the regular expression for the "root" of a website in django?
                            
                                Detect key press combination in Linux with Python?
                            
                                Eclipse PyDev: setting breakpoints in site-packages source
                            
                                Flexible, Solid and Portable Service Discovery
                            
                                Is there any preferable way to get user/group information from an Active Directory domain in Python?
                            
                                How to check if an RGB image contains only one color?
                            
                                Does "time.sleep()" not work inside a for loop with a print function using the "end" attribute?
                            
                                How to find the max object as per some custom criterion?
                            
                                IPython 5.0 and key bindings in console
                            
                                KeyError: "None of [['', '']] are in the [columns]" pandas python
                            
                                How to obtain the gradients in keras?
                            
                                python: how to generate char by adding int
                            
                                How to test a Connexion/Flask app?
                            
                                check_password() from a user again
                            
                                Imports in __init__.py and 'import as' statement

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Logging into SAML/Shibboleth authenticated server using python

Tags:

python

authentication

login

saml

saml-2.0

shibboleth

David Perlaza

People also ask

9 Answers

A. FIRST STEP: the user agent request the resource to the SP

B. SECOND STEP: filling the form, and submitting it

C. THIRD STEP: sending back the response to the SP

Gian Segato

Stéphane Bruckert

bjw

andrebask

chladni

TheBroda

draysams

Arthur.V

Yoni Friedman

Recent Activity

Donate For Us