Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a HTML page using HtmlUnit

Tags:

java

htmlunit

I know you may think this question is stupid, but I need to use HtmlUnit. However, it returns a page either as XML or as text.

I don't how to get the pure HTML (the same as the source code that browsers return)

I need this, because I need to use some written modules. Any ideas?

like image 855
Afshin Moazami Avatar asked Feb 19 '12 22:02

Afshin Moazami


People also ask

How do I enable JavaScript in HTMLUnit?

Enable/Disable JavaScript support You can change this to silently (HtmlUnit will still log the exceptions) ignore them (like in real browsers) by setting the option throwExceptionOnScriptError to false. final WebClient webClient = new WebClient(); webClient. getOptions. setThrowExceptionOnScriptError(false);


1 Answers

You can use the following piece of code to achieve your goal:

WebClient webClient = new WebClient();
Page page = webClient.getPage("http://example.com");
WebResponse response = page.getWebResponse();
String content = response.getContentAsString();

See javadocs of the WebResponse.html#getContentAsString() method.

like image 71
Dmytro Chyzhykov Avatar answered Sep 22 '22 19:09

Dmytro Chyzhykov