I am having difficulty using perl to visit a website via TOR if it is an https site but not if it is an http site.
#!/usr/bin/perl
use strict;
use WWW::Mechanize;
use LWP::Protocol::socks;
use LWP::Protocol::https;
use utf8;
my $mech = WWW::Mechanize->new(timeout => 60*5);
$mech->proxy(['http', 'https'], 'socks://localhost:9150');
$mech->get("https://www.google.com");
I am receiving the error message: Error GETing https://www.google.com: Status read failed: Bad file descriptor at line 10," where line i10 is the last line of the program.
In the TOR browser, I can successfully view: "https://www.google.com" with a port of 9150. I am using ActivePerl 5.16.2; Vadalia 0.2.21 and Tor 0.2.3.25. I have a Windows machine and my primary internet browser is Mozilla.
I have tried installing packages with the commands:
cpan LWP::UserAgent
ppm install LWP::Protocol::https
cpan LWP::Protocol::https
ppm install LWP::Protocol::socks
cpan LWP::Protocol::socks
ppm install Mozilla::CA
ppm install IO::Socket::SSL
ppm install Crypt::SSLeay
cpan Crypt::SSLeay
Thank you for any help! Please let me know whether there is any further information that I can provide.
Time ago, i'd found the way to go throught https sites with Tor using WWW::Curl::Easy to fetch those kind of sites, because using LWP i found the same problems. After that i save all html in files and parsing them using WWW::Mechanzie or HTML::TreeBuilder.
If you want more interactivity with site like post forms , etc. This solutions may be more tedious because you'll need to interact with curl.
package Curl;
use warnings;
use WWW::Curl::Easy;
use WWW::UserAgent::Random;
my $curl = WWW::Curl::Easy->new;
my $useragent = rand_ua("browsers");
my $host = 'localhost';
my $port = '9070';
my $timeout = '20';
my $connectTimeOut= '20';
&init;
sub get
{
my $url = shift;
$curl->setopt(CURLOPT_URL, $url);
my $response_body;
$curl->setopt(CURLOPT_WRITEDATA,\$response_body);
my $retcode = $curl->perform;
if ($retcode == 0) {
print("Transfer went ok Http::Code = ".$curl->strerror($retcode)."\n");
my $response_code = $curl->getinfo(CURLINFO_HTTP_CODE);
# judge result and next action based on $response_code
return \$response_body;
} else {
# Error code, type of error, error message
print("An error happened: $retcode ".$curl->strerror($retcode)." ".$curl->errbuf."\n");
return 0;
}
}
sub init
{
#setejem el proxy
$curl->setopt(CURLOPT_PROXY,"$host:".$port);
$curl->setopt(CURLOPT_PROXYTYPE,CURLPROXY_SOCKS4);
#posem les altres dades
$curl->setopt(CURLOPT_USERAGENT, $useragent);
$curl->setopt(CURLOPT_CONNECTTIMEOUT, $connectTimeOut);
$curl->setopt(CURLOPT_TIMEOUT, $timeout);
$curl->setopt(CURLOPT_SSL_VERIFYPEER,0);
$curl->setopt(CURLOPT_HEADER,0);
}
Hope this will help you!
Maybe the proxy that you are using is already an HTTPS proxy (ie. CONNECT proxy). In that case this should work (untested):
#!/usr/bin/perl
use strict;
use WWW::Mechanize;
use LWP::Protocol::socks;
use LWP::Protocol::https;
use utf8;
my $mech = WWW::Mechanize->new(timeout => 60*5);
$mech->proxy(['http'], 'socks://localhost:9150');
$mech->proxy(['https'], 'https://localhost:9150'); ### <-- make https go over https-connect proxy
$mech->get("https://www.google.com");
I cannot find the origin but I fought with that a long time ago. Basically the problem I had was with the imlpementation that LWP::UserAgent used for the https requests.
Possibly this question can help you: How do I force LWP to use Crypt::SSLeay for HTTPS requests?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With