I'm somewhat new to screen-scraping and I'm trying to automate logging into my bank. I figured that I could essentially do the following:
So far I've made it to step 2. Here's my Python code:
#!/usr/bin/python
import urllib, argparse, sys, re
def main():
parser = argparse.ArgumentParser(description="Attempt to log into a Mission Federal Bank Account")
parser.add_argument("-u", "--username", required=True, dest="username")
parser.add_argument("-p", "--password", required=True, dest="password")
arguments = parser.parse_args(sys.argv[1:])
post = {
'user': arguments.username,
'PIN': arguments.password,
'TestJavaScript': "OK",
'signonDest': "My Default Destination"
}
post_encoded = urllib.urlencode(post)
success_test = re.compile("<title id=\"HTMLTITLE\">Account Summary</title>")
result = urllib.urlopen("https://missionlink.missionfcu.org/MFCU/login.aspx", post_encoded)
result_string = result.read()
success = success_test.match(result_string)
if success == True:
print "Login Successful *devilish laugh*"
else:
print "Login Failed"
print result_string
return
if __name__ == "__main__":
main()
As you can see, it's really rather simple. All I needed, so I thought, was a URL (check), and the correct POST parameters (check). However, the bank doesn't accept my request, it won't log me in. I have determined that my methodology is correct my capturing the POST request and response via the Firefox TamperData extension. Here is a sanitized dump of the actual browser-generated POST request (this is the one that works, when done from a browser):
22:52:22.172[5239ms][total 5239ms] Status: 302[Found]
POST https://missionlink.missionfcu.org/MFCU/login.aspx Load Flags[LOAD_DOCUMENT_URI LOAD_INITIAL_DOCUMENT_URI ] Content Size[178] Mime Type[text/html]
Request Headers:
Host[missionlink.missionfcu.org]
User-Agent[Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.16) Gecko/20110323 Ubuntu/10.04 (lucid) Firefox/3.6.16]
Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
Accept-Language[en-us,en;q=0.5]
Accept-Encoding[gzip,deflate]
Accept-Charset[ISO-8859-1,utf-8;q=0.7,*;q=0.7]
Keep-Alive[115]
Connection[keep-alive]
Referer[https://www.missionfed.com/]
Post Data:
user[USERNAME]
PIN[PASSWORD]
TestJavaScript[OK]
signonDest[My+Default+Destination]
Response Headers:
Date[Fri, 08 Apr 2011 05:52:38 GMT]
Server[Microsoft-IIS/6.0]
PICS-Label[(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (v 0 s 0 n 0 l 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (v 0 s 0 n 0 l 0))(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (l 0 s 0 v 0 o 0))]
X-Powered-By[ASP.NET]
X-AspNet-Version[1.1.4322]
Location[https://missionlink.missionfcu.org/MFCU/Accounts/Summary.aspx]
Set-Cookie[ASP.NET_SessionId=0o5zkh55rrost555z0m3xs55; path=/
TestCookie=OK; expires=Fri, 15-Apr-2011 12:52:37 GMT; path=/
AuthenticationTicket=6F527794FC5C8DAA18B6BA2E77E19DA5A256C092B0879D3CA68C111E52338F441690B94E652AC57FDBEEFD613367C076AB0EC7FA515E4CEC67C5F86B4B625D9B233B0D1B35BB0C58AE4B7CE6D6614CD0F732918E51E3B7939F284D9586B9CB132A12F3717BF80581F58440D91256D1438349E10867618F3300290C3AE7AA436572188236727041B93BD3C8C90E6F67915942FCC25CDD31C9D4F7D1C5F8A29E7C9A58825C3928F32C91146CC7BE47E86F0551CF1550EF21585C92F6C6AA245EE4D7CC5E80C4EFEB29A9572E625F79E709CA50BBF24303CE5AF06664C8784C2CDFA52CF7B6441170D4B3C5B8D4B7E6582B6072BAF7; path=/]
Cache-Control[no-cache, no-store]
Pragma[no-cache]
Expires[-1]
Content-Type[text/html; charset=utf-8]
Content-Length[178]
I can't seem to determine what I'm missing here. There obviously seems to be something going on with the AuthenticationTicket cookie, but isn't that a part of the response and not of the request? Again, I'm somewhat new to screen-scraping, so bear with me. Any ideas on what I'm doing wrong?
It may be useful for you to check out mechanize for complex browser automation such as this.
Also, have you heard of Charles Proxy? It's essentially like Wireshark, but tailored for web development, and I suspect it will help you enormously during development.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With