Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OutOfMemoryError while using HtmlUnit for scraping

I am using HtmlUnit to login on to a site and then download data from the table

When I run my code is is causing java.lang.OutOfMemoryError And could not run further.

Following is my code:

WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_6);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setRedirectEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);
                            webClient.getOptions().setPrintContentOnFailingStatusCode(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setTimeout(50000);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setPopupBlockerEnabled(true);

HtmlPage htmlPage=webClient.getPage(url);
Thread.sleep(200);
                            //~~~~~~~Log-In
HtmlTextInput uname=(HtmlTextInput)htmlPage.getFirstByXPath("//*[@id=\"username\"]");
uname.setValueAttribute("xxx");
HtmlPasswordInput upass=(HtmlPasswordInput)htmlPage.getFirstByXPath("//*[@id=\"password\"]");
upass.setValueAttribute("xxx");
HtmlSubmitInput submit=(HtmlSubmitInput)htmlPage.getFirstByXPath("//*[@id=\"login-button\"]/input");
htmlPage=(HtmlPage) submit.click();
Thread.sleep(200);
webClient.waitForBackgroundJavaScript(10000);
for (int i = 0; i < 250; i++) {
 if (!htmlPage.asText().contains("Loading...")) {
     break;
  }
    synchronized (htmlPage) {
     htmlPage.wait(500);
 }
}

System.out.println(htmlPage.asText());

and Following is the stackTrace

java.lang.OutOfMemoryError: Java heap space
at net.sourceforge.htmlunit.corejs.javascript.Node.newString(Node.java:155)
at net.sourceforge.htmlunit.corejs.javascript.Node.newString(Node.java:151)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.createPropertyGet(IRFactory.java:1990)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformPropertyGet(IRFactory.java:968)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:106)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformPropertyGet(IRFactory.java:964)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:106)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformPropertyGet(IRFactory.java:964)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:106)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformFunctionCall(IRFactory.java:595)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:86)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformInfix(IRFactory.java:775)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:161)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformAssignment(IRFactory.java:368)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:152)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformExprStmt(IRFactory.java:488)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:149)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformIf(IRFactory.java:762)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:110)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformIf(IRFactory.java:762)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:110)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformIf(IRFactory.java:768)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:110)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformFunction(IRFactory.java:560)

I have put following lines in catlina.sh file to allot heap memory But still I am getting the same error (My RAM size is 2GB).

if [ -z "$LOGGING_MANAGER" ]; then
     JAVA_OPTS="$JAVA_OPTS -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager"
else
     JAVA_OPTS="$JAVA_OPTS $LOGGING_MANAGER"
fi

# Uncomment the following line to make the umask available when using the
# org.apache.catalina.security.SecurityListener
   JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.security.SecurityListener.UMASK=`umask`"
   JAVA_OPTS="$JAVA_OPTS  -Xms512m -Xmx2048m -XX:MaxPermSize=512m"
   JAVA_OPTS="-server -XX:+UseConcMarkSweepGC"
like image 854
Kunal Kishore Avatar asked Apr 05 '13 09:04

Kunal Kishore


2 Answers

you include this $JAVA_OPTS at last line of code,may your code works

JAVA_OPTS="$JAVA_OPTS -server -XX:+UseConcMarkSweepGC"
like image 50
Rajendra_Prasad Avatar answered Nov 13 '22 10:11

Rajendra_Prasad


I would set -XX:+HeapDumpOnOutOfMemoryError and then use a tool like Eclipse MAT.

like image 25
Philippe Marschall Avatar answered Nov 13 '22 09:11

Philippe Marschall