Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download pages that redirect after login?

Tags:

bash

perl

Normally pages that have a login form can be downloaded with

wget --no-check-certificate --save-cookies cookies --keep-session-cookies \
     --post-data="username=example&password=example" \
     "https://example.com/index.php?title=Special:Userlogin&returntotitle="

wget --no-check-certificate --load-cookies=cookies \
     --no-parent -r --level=2 -nc -E \
     https://example.com/Special:Sitemap

But in the case of DekiWiki sites, this doesn't work, if login is required.

The problem seams to be described in man wget

Note: if Wget is redirected after the POST request is completed, it will not send the POST data to the redirected URL. This is because URLs that process POST often respond with a redirection to a regular page, which does not desire or accept POST. It is not completely clear that this behavior is optimal; if it doesn't work out, it might be changed in the future.

Question

Can this be done using Perl e.g. with perhaps HTML::TreeBuilder 3 or HTML::TokeParser or Mechanize or any other Perl module?

like image 992
Sandra Schlichting Avatar asked Oct 10 '22 03:10

Sandra Schlichting


1 Answers

Some sites that require a login do not send the cookie back with the response.

Instead they send a redirection response (302 Object Moved), which most browsers follow automatically and then the cookie is sent in the response for that redirect page.

I use curl to do this by enabling the curl_opt FOLLOW_LOCATION, for the command line tool one uses the -location option. It is a free tool like wget.

curl --cookie cookie.txt --cookie-jar cookie.txt \
     --data-urlencode "username=example&password=example" \
     --insecure --location https://example.com/index.php?title=Special:Userlogin&returntotitle= -o downloadedfile.html https://example.com/Special:Sitemap

http://curl.haxx.se/download.html

Also, sometimes a login form expects a multi-part/form-data post instead of just a application/x-www-form-urlencoded post. To make curl do a multi-part/form-data post change to he --data-urlencode to -F.

like image 141
Motes Avatar answered Oct 13 '22 12:10

Motes