Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternatives to Selenium Webdriver [closed]

I use the Selenium Webdriver for C# and for Python to obtain data elements from websites, but the speed of the web scraping is terribly slow. Scraping 35000 data tables took me about 1,5 day. With the Selenium Webdriver I can execute Javascript to get a Java element. Is there some library available which doesn't require something like a Webdriver to execute Javascript on a webpage to retrieve elements and is able to click on elements as well? Or is there a faster alternative to Selenium?

like image 633
Robert Smit Avatar asked Apr 16 '15 09:04

Robert Smit


2 Answers

I suggest Selenium + PhantomJSDriver (Ghostdriver), which is used for GUI-less browser automation. With this you can easily navigate through the pages, select elements (you can select the flights), submit forms and also perform some scraping. Javascript is also supported.

You can got through the Selenium documentation here. You will have to download phantomjs.exe file.

A good tutorial forPhantomJSDriver is given in here

Config of PhantomJSDriver(from the tutorial):

DesiredCapabilities caps = new DesiredCapabilities();
caps.setJavascriptEnabled(true); // not really needed: JS enabled by default
caps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, "C://phantomjs.exe");
caps.setCapability("takesScreenshot", true);
WebDriver driver = new PhantomJSDriver(caps);   

Other option(this will not require WebDriver): PhantomJS

PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

This is GUI-less and also has the ability to take screenshots.

Example (from here):

var page = require('webpage').create();
page.open('http://example.com', function(status) {
  console.log("Status: " + status);
  if(status === "success") {
    page.render('example.png');
  }
  phantom.exit();
});

PS: I would suggest JSoup for web-scraping but it does not support Javascript. PhantomJSDriver has something called Ghost.py for python.

like image 73
LittlePanda Avatar answered Sep 27 '22 17:09

LittlePanda


I suggest you to use TestCafe.

enter image description here

TestCafe is free, open source framework for web functional testing (e2e testing). TestCafe's based on Node.js and doesn't use WebDriver at all.

TestCafe-powered tests are executed on the server side. To obtain DOM-elements, TestCafe provides powerfull flexible system of Selectors. TestCafe can execute JavaScript on tested webpage using the ClientFunction feature (see our Documentation).

TestCafe tests are really very fast, see for yourself. But the high speed test run does not affect the stability thanks to a build-in smart wait system.

Installation of TestCafe is very easy:

1) Check that you have Node.js on your PC (or install it).

2) To install TestCafe open cmd and type in:

npm install -g testcafe

Writing test is not a rocket-science. Here is a quick start: 1) Copy-paste the following code to your text editor and save it as "test.js"

import { Selector } from 'testcafe';

fixture `Getting Started`
    .page `http://devexpress.github.io/testcafe/example`;

test('My first test', async t => {
    await t
        .typeText('#developer-name', 'John Smith')
        .click('#submit-button')
        .expect(Selector('#article-header').innerText).eql('Thank you, John Smith!');
});

2) Run test in your browser (e.g. chrome) by typing the following command in cmd:

testcafe chrome test.js

3) Get the descriptive result in the console output.

TestCafe allows you to test against various browsers: local, remote (on devices, be it browser for Raspberry Pi or Safari for iOS), cloud (e.g. Sauce Labs) or headless (e.g. Nightmare). This means that you can easily use TestCafe with your Continious Integration infrastructure.

like image 29
Helen Dikareva Avatar answered Sep 27 '22 17:09

Helen Dikareva