Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cant open https web using Slimerjs, casperjs, phantomjs

This is first time i cant open website using headless browser such: phantomjs, slimerjs or casperjs. I just want to open website. I just create very basic script to open the website and take screenshot. but 3 (three) of them give me blank picture.

i try using:

--debug=true 
--ssl-protocol=TLSv1.2 (i try each of available protocol) 
--ignore-ssl-errors=true

Here my script:

Slimerjs

var page = require("webpage").create();
page.open("https://domain/")
    .then(function(status){
         if (status == "success") {
            page.viewportSize = { width:1024, height:768 };
            page.render('screenshot.png');
         }
         else {
             console.log("Sorry, the page is not loaded");
         }
         page.close();
         phantom.exit();
    });

phantomjs

var page = require('webpage').create();
page.open('https://domain/', function() {
  page.render('screenshot.png');
  phantom.exit();
});

casperjs

var casper = require('casper').create({
  viewportSize: {width: 950, height: 950}
});

casper.start('https://domain/', function() {
    this.capture('screenshot.png');
});

casper.run();

I even try to use screen capture service to know if they can open or not. But all of them give me nothing too.

is there i miss something?

like image 206
plonknimbuzz Avatar asked Apr 15 '18 07:04

plonknimbuzz


2 Answers

The issue is not because of PhantomJS as such. The site you are checking is protected by a F5 network protection

https://devcentral.f5.com/articles/these-are-not-the-scrapes-youre-looking-for-session-anomalies

So its not that the page doesn't load. It is that the protection mechanism detects that PhantomJS is a bot based on checks they have implemented

Page Loaded

The easiest of fixes is to use Chrome instead of PhantomJS. Else it means a decent amount of investigation time

Some similar unanswered/answered question in the past

Selenium and PhantomJS : webpage thinks Javascript is disabled

PhantomJS get no real content running on AWS EC2 CentOS 6

file_get_contents while bypassing javascript detection

Python POST Request Not Returning HTML, Requesting JavaScript Be Enabled

I will update this post with more details that I find. But my experience says, go with what works instead of wasting time on such sites which don't work under PhantomJS

Update-1

I have tried to import the browser cookies to PhantomJS and it still won't work. Which means there is some hard checks

Cookies

like image 175
Tarun Lalwani Avatar answered Nov 13 '22 23:11

Tarun Lalwani


I experienced this issue with phantomJS and the following service args resolved it:

--ignore-ssl-errors=true
--ssl-protocol=any
--web-security=false
--proxy-type=None

Can't help you with casperJS and slimerJS, and don't know exactly why this worked.

like image 1
sudonym Avatar answered Nov 13 '22 22:11

sudonym