Using Selenium WebDriver for Java, is it possible to get the HTML of a webpage given a specified URL?
I know that, once a webpage is loaded in a browser, the HTML can be obtained using WebDriver.getPageSource(). However, for improved efficiency, is it possible to obtain the HTML without loading the page in a browser first?
You can achieve this using headless browser.
A headless browser is a web-browser without a graphical user interface. This program will behave just like a browser but will not show any GUI.
Headless browsers are typically used in following situations :-
You have a central build tool which does not have any browser installed on it. So to do the basic level of sanity tests after every build you may use the headless browser to run your tests.
You want to write a crawler program that goes through different pages and collects data, headless browser will be your choice. Because you really don’t care about opening a browser. All you need is to access the webpages.
You would like to simulate multiple browser versions on the same machine. In that case you would want to use a headless browser, because most of them support simulation of different versions of browsers. We will come to this point soon.
Things to pay attention to before using headless browser
Headless browsers are simulation programs, they are not your real browsers. Most of these headless browsers have evolved enough to simulate, to a pretty close approximation, like a real browser. Still you would not want to run all your tests in a headless browser. JavaScript is one area where you would want to be really careful before using a Headless browser. JavaScript are implemented differently by different browsers. Although JavaScript is a standard but each browser has its own little differences in the way that they have implemented JavaScript. This is also true in case of headless browsers also. For example HtmlUnit headless browser uses the Rihno JavaScript engine which not being used by any other browser.
Some of the examples of Headless Drivers include
httpRequest in JAVA:
public static String executePost(String targetURL, String urlParameters) {
HttpURLConnection connection = null;
try {
//Create connection
URL url = new URL(targetURL);
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type",
"application/x-www-form-urlencoded");
connection.setRequestProperty("Content-Length",
Integer.toString(urlParameters.getBytes().length));
connection.setRequestProperty("Content-Language", "en-US");
connection.setUseCaches(false);
connection.setDoOutput(true);
//Send request
DataOutputStream wr = new DataOutputStream (
connection.getOutputStream());
wr.writeBytes(urlParameters);
wr.close();
//Get Response
InputStream is = connection.getInputStream();
BufferedReader rd = new BufferedReader(new InputStreamReader(is));
StringBuilder response = new StringBuilder();
String line;
while ((line = rd.readLine()) != null) {
response.append(line);
response.append('\r');
}
rd.close();
return response.toString();
} catch (Exception e) {
e.printStackTrace();
return null;
} finally {
if (connection != null) {
connection.disconnect();
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With