Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logging in using WWW::Mechanize

Tags:

perl

I'm looking at logging in to https://imputationserver.sph.umich.edu/index.html#!pages/login with the following:

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';
use feature 'say';
use autodie ':all';
use WWW::Mechanize;
use DDP;

my $mech = WWW::Mechanize->new();
$mech->get( 'https://imputationserver.sph.umich.edu/index.html#!pages/login' );
my $username = '';
my $password = '';
#$mech->set_visible( $username, $password );
#$mech -> field('Username:', $username);
#$mech -> field('Password:', $password);

my %data;
@{ $data{links} } = $mech -> find_all_links();
@{ $data{inputs}    } = $mech -> find_all_inputs();
@{ $data{submits} } = $mech ->find_all_submits();
@{ $data{forms} } = $mech -> forms();
p %data;

#$mech->set_fields('Username' => $username, 'Password' => $password);

but there doesn't appear to be any useful information, which is shown by printing:

{
    forms     [],
    inputs    [],
    links     [
        [0] WWW::Mechanize::Link  {
            public methods (9) : attrs, base, name, new, tag, text, URI, url, url_abs
            private methods (0)
            internals: [
                [0] "favicon.ico",
                [1] undef,
                [2] undef,
                [3] "link",
                [4] URI::https,
                [5] {
                    href   "favicon.ico",
                    rel    "icon"
                }
            ]
        },
        [1] WWW::Mechanize::Link  {
            public methods (9) : attrs, base, name, new, tag, text, URI, url, url_abs
            private methods (0)
            internals: [
                [0] "assets/css/loader.css",
                [1] undef,
                [2] undef,
                [3] "link",
                [4] var{links}[0][4],
                [5] {
                    href   "assets/css/loader.css",
                    rel    "stylesheet"
                }
            ]
        }
    ],
    submits   []
}

I looked on Firefox's Tools -> page info, but got nothing valuable there, I don't see where the username and password are coming from on this page.

I've tried

$mech -> submit_form(
    form_number => 0,
    fields      => { username => $username, password => $password },
);

but then I get No form defined

In terms of links, inputs, fields, I don't see any, and I don't know how to move on.

I don't see anything on https://metacpan.org/pod/WWW::Mechanize::Examples that helps me out in this situation.

How can I log in to this page using Perl's WWW::Mechanize?

like image 956
con Avatar asked Dec 06 '22 08:12

con


2 Answers

As Dave says, many modern websites are going to be handling login via a Javascript-driven (private) API. You'll need to open the Network tab in your browser, do the login manually as you normally would, and watch the sequence of GETs, PUTs, POSTs, etc. that happen to see what interaction is needed to complete a login, and then execute that sequence yourself with Mech or LWP.

It's possible that the Javascript on the page is going to create JSON or even JWTs to do the interactions; you'll have to duplicate that in your code for it to work.

In particular, check the headers for cookies, and authentication and CSRF tokens being set; you'll need to capture those and re-send them with requests (POST requests will need the CSRF tokens). This may entail doing more interactions with the site to capture the sequence of operations and duplicate them. HTTP::Cookies should handle the cookies for you automatically, but more sophisticated header usage will require you to use HTTP::Headers to extract the data and possibly resubmit it that way.

At heart, the processes are all pretty simple; it's just a matter of accurately replicating them so that you can automate them.

You should check as to whether the site already has a programmer's API, and use that if so; such an API will almost always provide you simpler, direct interfaces to site functions and easier-to-use returned data formats. If the site is highly dynamic, like a heavy React site, it's possible that other pages in the site are going to load a skeletal HTML page and then use Javascript to fill it out as well; as the page evolves, your code will have to as well. If you're using a defined programmer's API, you will probably be able to depend on the interactions and returned data remaining the same as long as the API version doesn't change.

A final note: you should verify that you're not violating your user agreement by using automation. Some sites explicitly bar using automated methods of logging in.

like image 124
Joe McMahon Avatar answered Dec 22 '22 00:12

Joe McMahon


The interesting part of the source from that page is this:

<body class="bg-light">

  <div id="main">
    <div class="spinner">
        <div class="bounce1"></div>
      <div class="bounce2"></div>
      <div class="bounce3"></div>
    </div>
  </div>

  <script src="./dist/bundles/cloudgene/index.js"></script>


</body>

So, there's no login form in the HTML that makes up that page. Which explains why WWW::Mechanize can't see anything - there's nothing there to see.

It seems that that the page is all built by that Javascript file - index.js.

Now, you could spend hours reading that JS and working exactly how the page works. But that'll be hard work and there's an easier way.

No matter how the client (the browser or your code) works, the actual login must be handled by an HTTP request and response. The client sends a request, the server responds and the client acts on that response. You just need to work out what the request and response look like and then reproduce that in your code.

And you can examine the HTTP requests and response using tools that are almost certainly built into your browser (in Chrome, it's dot menu -> more tools -> developer tools). That will allow you to see exactly what the HTTP request looks like.

Having done that, you "just" need to craft a similar response using your Perl code. You'll probably find that's easier using LWP::UserAgent and its associated modules rather than WWW::Mechanize.

like image 29
Dave Cross Avatar answered Dec 22 '22 00:12

Dave Cross