I use the Selenium Webdriver for C# and for Python to obtain data elements from websites, but the speed of the web scraping is terribly slow. Scraping 35000 data tables took me about 1,5 day. With the Selenium Webdriver I can execute Javascript to get a Java element. Is there some library available which doesn't require something like a Webdriver to execute Javascript on a webpage to retrieve elements and is able to click on elements as well? Or is there a faster alternative to Selenium?
I suggest Selenium + PhantomJSDriver (Ghostdriver), which is used for GUI-less browser automation. With this you can easily navigate through the pages, select elements (you can select the flights), submit forms and also perform some scraping. Javascript is also supported.
You can got through the Selenium documentation here. You will have to download phantomjs.exe file.
A good tutorial forPhantomJSDriver is given in here
Config of PhantomJSDriver(from the tutorial):
DesiredCapabilities caps = new DesiredCapabilities();
caps.setJavascriptEnabled(true); // not really needed: JS enabled by default
caps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, "C://phantomjs.exe");
caps.setCapability("takesScreenshot", true);
WebDriver driver = new PhantomJSDriver(caps);
Other option(this will not require WebDriver): PhantomJS
PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.
This is GUI-less and also has the ability to take screenshots.
Example (from here):
var page = require('webpage').create();
page.open('http://example.com', function(status) {
console.log("Status: " + status);
if(status === "success") {
page.render('example.png');
}
phantom.exit();
});
PS: I would suggest JSoup for web-scraping but it does not support Javascript. PhantomJSDriver has something called Ghost.py for python.
I suggest you to use TestCafe.
TestCafe is free, open source framework for web functional testing (e2e testing). TestCafe's based on Node.js and doesn't use WebDriver at all.
TestCafe-powered tests are executed on the server side. To obtain DOM-elements, TestCafe provides powerfull flexible system of Selectors. TestCafe can execute JavaScript on tested webpage using the ClientFunction feature (see our Documentation).
TestCafe tests are really very fast, see for yourself. But the high speed test run does not affect the stability thanks to a build-in smart wait system.
Installation of TestCafe is very easy:
1) Check that you have Node.js on your PC (or install it).
2) To install TestCafe open cmd and type in:
npm install -g testcafe
Writing test is not a rocket-science. Here is a quick start: 1) Copy-paste the following code to your text editor and save it as "test.js"
import { Selector } from 'testcafe';
fixture `Getting Started`
.page `http://devexpress.github.io/testcafe/example`;
test('My first test', async t => {
await t
.typeText('#developer-name', 'John Smith')
.click('#submit-button')
.expect(Selector('#article-header').innerText).eql('Thank you, John Smith!');
});
2) Run test in your browser (e.g. chrome) by typing the following command in cmd:
testcafe chrome test.js
3) Get the descriptive result in the console output.
TestCafe allows you to test against various browsers: local, remote (on devices, be it browser for Raspberry Pi or Safari for iOS), cloud (e.g. Sauce Labs) or headless (e.g. Nightmare). This means that you can easily use TestCafe with your Continious Integration infrastructure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With