Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Screen-Scraping: Processing a POST Login

I'm somewhat new to screen-scraping and I'm trying to automate logging into my bank. I figured that I could essentially do the following:

  1. Using the source of the bank's web page, some tools, and some clever hackery, determine where they're posting the login data to and how it's formatted.
  2. Mimick this in Python.
  3. World domination.

So far I've made it to step 2. Here's my Python code:

#!/usr/bin/python

import urllib, argparse, sys, re

def main():
    parser = argparse.ArgumentParser(description="Attempt to log into a Mission Federal Bank Account")
    parser.add_argument("-u", "--username", required=True, dest="username")
    parser.add_argument("-p", "--password", required=True, dest="password")
    arguments = parser.parse_args(sys.argv[1:])

    post = {
        'user': arguments.username,
        'PIN': arguments.password,
        'TestJavaScript': "OK",
        'signonDest': "My Default Destination"
    }

    post_encoded = urllib.urlencode(post)

    success_test = re.compile("<title id=\"HTMLTITLE\">Account Summary</title>")

    result = urllib.urlopen("https://missionlink.missionfcu.org/MFCU/login.aspx", post_encoded)
    result_string = result.read()

    success = success_test.match(result_string)

    if success == True:
        print "Login Successful *devilish laugh*"
    else:
        print "Login Failed"
        print result_string

    return

if __name__ == "__main__":
    main()

As you can see, it's really rather simple. All I needed, so I thought, was a URL (check), and the correct POST parameters (check). However, the bank doesn't accept my request, it won't log me in. I have determined that my methodology is correct my capturing the POST request and response via the Firefox TamperData extension. Here is a sanitized dump of the actual browser-generated POST request (this is the one that works, when done from a browser):

22:52:22.172[5239ms][total 5239ms] Status: 302[Found]
POST https://missionlink.missionfcu.org/MFCU/login.aspx Load Flags[LOAD_DOCUMENT_URI  LOAD_INITIAL_DOCUMENT_URI  ] Content Size[178] Mime Type[text/html]
    Request Headers:
      Host[missionlink.missionfcu.org]
      User-Agent[Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.16) Gecko/20110323 Ubuntu/10.04 (lucid) Firefox/3.6.16]
      Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
      Accept-Language[en-us,en;q=0.5]
      Accept-Encoding[gzip,deflate]
      Accept-Charset[ISO-8859-1,utf-8;q=0.7,*;q=0.7]
      Keep-Alive[115]
      Connection[keep-alive]
      Referer[https://www.missionfed.com/]
   Post Data:
      user[USERNAME]
      PIN[PASSWORD]
      TestJavaScript[OK]
      signonDest[My+Default+Destination]
   Response Headers:
      Date[Fri, 08 Apr 2011 05:52:38 GMT]
      Server[Microsoft-IIS/6.0]
      PICS-Label[(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (v 0 s 0 n 0 l 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (v 0 s 0 n 0 l 0))(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (l 0 s 0 v 0 o 0))]
      X-Powered-By[ASP.NET]
      X-AspNet-Version[1.1.4322]
      Location[https://missionlink.missionfcu.org/MFCU/Accounts/Summary.aspx]
      Set-Cookie[ASP.NET_SessionId=0o5zkh55rrost555z0m3xs55; path=/
TestCookie=OK; expires=Fri, 15-Apr-2011 12:52:37 GMT; path=/
AuthenticationTicket=6F527794FC5C8DAA18B6BA2E77E19DA5A256C092B0879D3CA68C111E52338F441690B94E652AC57FDBEEFD613367C076AB0EC7FA515E4CEC67C5F86B4B625D9B233B0D1B35BB0C58AE4B7CE6D6614CD0F732918E51E3B7939F284D9586B9CB132A12F3717BF80581F58440D91256D1438349E10867618F3300290C3AE7AA436572188236727041B93BD3C8C90E6F67915942FCC25CDD31C9D4F7D1C5F8A29E7C9A58825C3928F32C91146CC7BE47E86F0551CF1550EF21585C92F6C6AA245EE4D7CC5E80C4EFEB29A9572E625F79E709CA50BBF24303CE5AF06664C8784C2CDFA52CF7B6441170D4B3C5B8D4B7E6582B6072BAF7; path=/]
      Cache-Control[no-cache, no-store]
      Pragma[no-cache]
      Expires[-1]
      Content-Type[text/html; charset=utf-8]
      Content-Length[178]

I can't seem to determine what I'm missing here. There obviously seems to be something going on with the AuthenticationTicket cookie, but isn't that a part of the response and not of the request? Again, I'm somewhat new to screen-scraping, so bear with me. Any ideas on what I'm doing wrong?

like image 765
Naftuli Kay Avatar asked Nov 28 '25 15:11

Naftuli Kay


1 Answers

It may be useful for you to check out mechanize for complex browser automation such as this.

Also, have you heard of Charles Proxy? It's essentially like Wireshark, but tailored for web development, and I suspect it will help you enormously during development.

like image 81
Kyle Wild Avatar answered Dec 01 '25 07:12

Kyle Wild