HtmlUnit does not appears to close windows in the webclient and thus creating a memory leak. I am trying to get a page with HtmlUnit and pass it off to JSoup for parsing. I am aware that JSoup can connect to a page but I need to use this approach as I need to hold a logged in session on some sites prior to parsing them.
Here is the code:
import java.io.IOException;
import java.net.MalformedURLException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class HtmlUnitLeakTest {
public static void main(String args[]) throws FailingHttpStatusCodeException, MalformedURLException, IOException{
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setPrintContentOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
for(int i = 0; i < 500; i++){
HtmlPage page = webClient.getPage("http://www.stackoverflow.com");
Document doc = Jsoup.parse(page.asXml());
webClient.closeAllWindows();
System.out.println(i);
if((i % 5 == 0)){
System.out.println(i);
}
}
}
}
As this runs the memory continually climbs and in my debug screen I can see all the windows are still referenced under the webclient and not closed.
I have seen this code around that is suppose to close these windows:
List<WebWindow> windows = webclient.getWebWindows();
for (WebWindow ww : windows) {
ww.getJobManager().removeAllJobs();
ww.getJobManager().shutdown();
}
webclient.closeAllWindows();
But alas it does not and I continue to have the memory leak.
Anyone experienced this issue?
Cheers
Version info:
HtmlUnit 2.15
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
I have a piece of code very similar to yours, and I've been pulling my hair out for the last 2 days trying to solve this. I tried everything they mention on the web and I could not find a solution - to the point where I started messing around with the code and suddenly, the leak stopped. I was using a memory analyzer tool and my program got the point where it was using 2gb of ram (which I set up as java heap in the jvm arguments), and then it crashed after 20 minutes. Now it has been running for 1 hour and the memory usage is stable at 10mb.
What did I do? I've put the webClient initialization inside the for loop:
public class HtmlUnitLeakTest {
public static void main(String args[]) throws FailingHttpStatusCodeException, MalformedURLException, IOException{
for(int i = 0; i < 500; i++){
try{
WebClient webClient = initializeClient();
HtmlPage page = webClient.getPage("http://www.stackoverflow.com");
Document doc = Jsoup.parse(page.asXml());
webClient.closeAllWindows();
System.out.println(i);
if((i % 5 == 0)){
System.out.println(i);
}
}finally {
webClient.getCurrentWindow().getJobManager().removeAllJobs();
webClient.close();
System.gc();
}
}
}
private static WebClient initilizeCilent(){
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setPrintContentOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
return webClient;
}
}
I know it is a theoretically wrong approach, but I was desperate to get it working, and now it does! If you already fixed the problem with a better (correct) approach, please I would like to know that too!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With