Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create HtmlUnit HTMLPage object from String?

Tags:

java

htmlunit

This question was asked once already, but the API changed I guess and the answers are no valid anymore.

URL url = new URL("http://www.example.com");
StringWebResponse response = new StringWebResponse("<html><head><title>Test</title></head><body></body></html>", url);
HtmlPage page = HTMLParser.parseHtml(response, new TopLevelWindow("top", new WebClient()));
System.out.println(page.getTitleText());

Can't be done because TopLevelWindow is protected and stuff like extending/implementing the window because of that is ridiculous :)

Anybody has an idea how to do that ? It seems to me weird that it can't be done easily.

like image 571
lisak Avatar asked May 26 '11 09:05

lisak


3 Answers

This code works in GroovyConsole

@Grapes(
    @Grab(group='net.sourceforge.htmlunit', module='htmlunit', version='2.8')
)

import com.gargoylesoftware.htmlunit.*
import com.gargoylesoftware.htmlunit.html.*

URL url = new URL("http://www.example.com");
StringWebResponse response = new StringWebResponse("<html><head><title>Test</title></head><body></body></html>", url);
WebClient client = new WebClient()
HtmlPage page = HTMLParser.parseHtml(response, client.getCurrentWindow());
System.out.println(page.getTitleText());
like image 169
Grooveek Avatar answered Nov 04 '22 11:11

Grooveek


Using HTMLUnit 2.40, Grooveek's code won't compile, you get "Cannot make a static reference to the non-static method parseHtml(WebResponse, WebWindow) from the type HTMLParser". But there is now a class HtmlUnitNekoHtmlParser implementing the HTMLParser interface, so the following code works:

StringWebResponse response = new StringWebResponse(
    "<html><head><title>Test</title></head><body></body></html>", 
    new URL("http://www.example.com"));
HtmlPage page = new HtmlUnitNekoHtmlParser().parseHtml(
    response, new WebClient().getCurrentWindow());
like image 32
mrzzmr Avatar answered Nov 04 '22 12:11

mrzzmr


There is some sample code in the FAQ https://htmlunit.sourceforge.io/faq.html#HowToParseHtmlString

e.g.

final String htmlCode = "<html>"
        + "  <head>"
        + "    <title>Title</title>"
        + "  </head>"
        + "  <body>"
        + "    content..."
        + "  </body>"
        + "</html> ";
try (WebClient webClient = new WebClient(browserVersion)) {
    final HtmlPage page = webClient.loadHtmlCodeIntoCurrentWindow(htmlCode);
    // work with the html page
}
like image 29
RBRi Avatar answered Nov 04 '22 12:11

RBRi