Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get WWW-Mechanize to login to Wells Fargo's website?

I am trying to use Perl's WWW::Mechanize to login to my bank and pull transaction information. After logging in through a browser to my bank (Wells Fargo), it briefly displays a temporary web page saying something along the lines of "please wait while we verify your identity". After a few seconds it proceeds to the bank's webpage where I can get my bank data. The only difference is that the URL contains several more "GET" parameters appended to the URL of the temporary page, which only had a sessionID parameter.

I was able to successfully get WWW::Mechanize to login from the login page, but it gets stuck on the temporary page. There is a <meta http-equiv="Refresh"... tag in the header, so I tried $mech->follow_meta_redirect but it didn't get me past that temporary page either.

Any help to get past this would be appreciated. Thanks in advance.

Here is the barebones code that gets me stuck at the temporary page:

#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
$mech->agent_alias( 'Linux Mozilla' );

$mech->get( "https://www.wellsfargo.com" );
$mech->submit_form(
    form_number => 2,
    fields => {
        userid => "$userid",
        password => "$password"
    },
    button => "btnSignon"
);
like image 200
J Miller Avatar asked Apr 29 '10 19:04

J Miller


4 Answers

Sorry, it has been years since I've coded Perl. However, since there's no "copy and paste" answer posted for this question yet, here's how to scrape Wells Fargo in Ruby:

require 'rubygems'
require 'mechanize'

username = 'your_username'
password = 'your_password'

agent = Mechanize.new
agent.user_agent_alias = 'Windows IE 6'

# get first page
page = agent.get('https://online.wellsfargo.com/signon/')

# find and fill form
form = page.form_with(:name => 'Signon')      
form['userid'] = username
form['password'] = password
page = agent.submit form

# find the refresh url
page.body.match /content="1;URL=(.*?)"/
nexturl = $1

# wait a little while and then get the next page
sleep 3
page = agent.get nexturl

# If you have multiple accounts, you can use this. If you just have a single account, you can remove this block
companies = [['Account1', '123456789'], 
             ['Account2', '123456789']]

companies.each do |name, id|
  form = page.form_with(:name => 'ChangeViewFormBean')
  form['viewKey'] = id
  page = agent.submit form

  available_balance = page.search("#cashTotalAvailBalance").text.strip

  puts "#{name}: #{available_balance}"
  sleep 2
end

Works Cited: There's a guy who wrote a version of this script, posted it to his code directory and then forwarded the whole thing to his blog. His last name is Youngblood or similar. I found the source in the internet archive/way back machine and modified it to make what you see above. So, thanks Mr. Youngblood or similar, where ever you are - and thanks for teaching me the meta scrape trick!

like image 188
johnnygoodman Avatar answered Oct 03 '22 23:10

johnnygoodman


You'll need to reverse-engineer what's happening on that intermediary page. Does it use Javascript to set some cookies, for example? Mech won't parse or execute Javascript on a page, so it may be trying to follow the meta-refresh but missing some crucial information about what needs to happen for the final request.

Try using a tool like Firebug to watch the request that's sent when the browser follows the meta-refresh. Examine all the request headers, including cookies, that are sent to request the final page. Then use Mech to duplicate that.

like image 36
friedo Avatar answered Oct 03 '22 23:10

friedo


If you know the location of the next page you can try getting it after attaching the extra get parameters using

$mech->add_header($name => $value);
like image 37
Narthring Avatar answered Oct 03 '22 21:10

Narthring


First you need to know is this Javascript or not: i recommend to use Web Developer (but you may use NoScript too) to disable Javascript and try to login via browser (but first you need to clear all cookies related to your target site!).

If you still (with Javascript disabled) can login than this is not Javascript issue and you need to investigate HTTP headers (it may be x,y coordinates of the clicked button for example or some cookies recieved only when you load CSS file etc).

I recommend to use HttpFox for checking HTTP headers. You need to run HttpFox logging and after that perform login again (by the way disabling images before doing this will significantly reduce your log). After that you need to check every request and corresponding response to find where hidden cookies are setted or some hidden form param created.

If you can not login after disabling Javascript than you need to look at the headers too. You need to compare cookies provided in HTTP header response with cookies you have in the later request. After you find html with "malicious" Javascript you can analize this Javascript to find algorithm how this cookie (or form param) created.

And you last step will be to repeat this cookie/form param in you WWW::Mechanize request.

like image 31
gangabass Avatar answered Oct 03 '22 21:10

gangabass