I'm looking at logging in to https://imputationserver.sph.umich.edu/index.html#!pages/login with the following:
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use feature 'say';
use autodie ':all';
use WWW::Mechanize;
use DDP;
my $mech = WWW::Mechanize->new();
$mech->get( 'https://imputationserver.sph.umich.edu/index.html#!pages/login' );
my $username = '';
my $password = '';
#$mech->set_visible( $username, $password );
#$mech -> field('Username:', $username);
#$mech -> field('Password:', $password);
my %data;
@{ $data{links} } = $mech -> find_all_links();
@{ $data{inputs} } = $mech -> find_all_inputs();
@{ $data{submits} } = $mech ->find_all_submits();
@{ $data{forms} } = $mech -> forms();
p %data;
#$mech->set_fields('Username' => $username, 'Password' => $password);
but there doesn't appear to be any useful information, which is shown by printing:
{
forms [],
inputs [],
links [
[0] WWW::Mechanize::Link {
public methods (9) : attrs, base, name, new, tag, text, URI, url, url_abs
private methods (0)
internals: [
[0] "favicon.ico",
[1] undef,
[2] undef,
[3] "link",
[4] URI::https,
[5] {
href "favicon.ico",
rel "icon"
}
]
},
[1] WWW::Mechanize::Link {
public methods (9) : attrs, base, name, new, tag, text, URI, url, url_abs
private methods (0)
internals: [
[0] "assets/css/loader.css",
[1] undef,
[2] undef,
[3] "link",
[4] var{links}[0][4],
[5] {
href "assets/css/loader.css",
rel "stylesheet"
}
]
}
],
submits []
}
I looked on Firefox's Tools -> page info, but got nothing valuable there, I don't see where the username and password are coming from on this page.
I've tried
$mech -> submit_form(
form_number => 0,
fields => { username => $username, password => $password },
);
but then I get No form defined
In terms of links, inputs, fields, I don't see any, and I don't know how to move on.
I don't see anything on https://metacpan.org/pod/WWW::Mechanize::Examples that helps me out in this situation.
How can I log in to this page using Perl's WWW::Mechanize?
As Dave says, many modern websites are going to be handling login via a Javascript-driven (private) API. You'll need to open the Network tab in your browser, do the login manually as you normally would, and watch the sequence of GETs, PUTs, POSTs, etc. that happen to see what interaction is needed to complete a login, and then execute that sequence yourself with Mech
or LWP
.
It's possible that the Javascript on the page is going to create JSON or even JWTs to do the interactions; you'll have to duplicate that in your code for it to work.
In particular, check the headers for cookies, and authentication and CSRF tokens being set; you'll need to capture those and re-send them with requests (POST requests will need the CSRF tokens). This may entail doing more interactions with the site to capture the sequence of operations and duplicate them. HTTP::Cookies
should handle the cookies for you automatically, but more sophisticated header usage will require you to use HTTP::Headers
to extract the data and possibly resubmit it that way.
At heart, the processes are all pretty simple; it's just a matter of accurately replicating them so that you can automate them.
You should check as to whether the site already has a programmer's API, and use that if so; such an API will almost always provide you simpler, direct interfaces to site functions and easier-to-use returned data formats. If the site is highly dynamic, like a heavy React site, it's possible that other pages in the site are going to load a skeletal HTML page and then use Javascript to fill it out as well; as the page evolves, your code will have to as well. If you're using a defined programmer's API, you will probably be able to depend on the interactions and returned data remaining the same as long as the API version doesn't change.
A final note: you should verify that you're not violating your user agreement by using automation. Some sites explicitly bar using automated methods of logging in.
The interesting part of the source from that page is this:
<body class="bg-light">
<div id="main">
<div class="spinner">
<div class="bounce1"></div>
<div class="bounce2"></div>
<div class="bounce3"></div>
</div>
</div>
<script src="./dist/bundles/cloudgene/index.js"></script>
</body>
So, there's no login form in the HTML that makes up that page. Which explains why WWW::Mechanize can't see anything - there's nothing there to see.
It seems that that the page is all built by that Javascript file - index.js
.
Now, you could spend hours reading that JS and working exactly how the page works. But that'll be hard work and there's an easier way.
No matter how the client (the browser or your code) works, the actual login must be handled by an HTTP request and response. The client sends a request, the server responds and the client acts on that response. You just need to work out what the request and response look like and then reproduce that in your code.
And you can examine the HTTP requests and response using tools that are almost certainly built into your browser (in Chrome, it's dot menu -> more tools -> developer tools). That will allow you to see exactly what the HTTP request looks like.
Having done that, you "just" need to craft a similar response using your Perl code. You'll probably find that's easier using LWP::UserAgent and its associated modules rather than WWW::Mechanize.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With