Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTMLUnit: Tons of obsolete content and can't create objects warnings on getPage() then fails with Exception invoking setOuterHTML on getByXPath()

Tags:

java

htmlunit

I'm trying out HTMLUnit to automate downloading data off a webapp. However, I am getting a whole mess of warnings on getPage() (most of which seem to deal with linked scripts that I don't think i even need) and then a fatal com.gargoylesoftware.htmlunit.ScriptException: Exception invoking setOuterHTML when I try and run getByXPath to pull the data I'm looking for. And from the errors I get, I can't for the life of me figure out what's going on. Y'all got any ideas?

Here's my code:

import java.util.List;

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class ScrapperApp {

    private static void go() throws Exception {
        HtmlPage nextPage;
        String url = "http://media.ethics.ga.gov/search/Campaign/Campaign_Name.aspx?NameID=5751&FilerID=C2009000085&Type=candidate";

        final WebClient webclient = new WebClient();
        final HtmlPage page = webclient.getPage(url);

        System.out.println("PULLING LINKS:");

        List<HtmlAnchor> articles = (List<HtmlAnchor>) page.getByXPath("//div[@class='hform1']/a[@class='lblentrylink']");

        /*for(int x=0; x<articles.size(); x++) {
            nextPage = articles.get(x).click();
            System.out.println(nextPage.getBody());
        }*/
    }

    public static void main(String[] args) throws Exception {
        go();
        System.out.println("COMPLETE");
    }

}

and here is my console output:

Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/urchin.js] line=[443] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/urchin.js] line=[448] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/urchin.js] line=[456] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:52 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:53 PM com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument execCommand
WARNING: Nothing done for execCommand(BackgroundImageCache, ...) (feature not implemented)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/search/theethics.css' [1621:72] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/search/theethics.css' [1621:72] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/search/theethics.css' [1722:1] Error in style sheet. (Invalid token ".123". Was expecting one of: <EOF>, <S>, <IDENT>, "<!--", "-->", <HASH>, <IMPORT_SYM>, <PAGE_SYM>, <MEDIA_SYM>, ".", ":", "*", "[", <ATKEYWORD>.)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [4:1] Error in style rule. (Invalid token ".". Was expecting one of: <S>, <LBRACE>, <COMMA>.)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [4:1] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [538:16] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [538:16] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [6:1] Error in style rule. (Invalid token ".". Was expecting one of: <S>, <LBRACE>, <COMMA>.)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [6:1] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [105:17] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [105:17] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [160:16] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [160:16] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://media.ethics.ga.gov/Search/Telerik.Web.UI.WebResource.axd?_TSM_HiddenField_=ctl00_ContentPlaceHolder1_RadScriptManager1_TSM&compress=1&_TSM_CombinedScripts_=%3b%3bSystem.Web.Extensions%2c+Version%3d3.5.0.0%2c+Culture%3dneutral%2c+PublicKeyToken%3d31bf3856ad364e35%3aen-US%3a7263e9c6-5962-41bc-b839-88b704bfcf0d%3aea597d4b%3ab25378d2%3bTelerik.Web.UI%2c+Version%3d2011.2.915.35%2c+Culture%3dneutral%2c+PublicKeyToken%3d121fae78165ba3d4%3aen-US%3a168ec6eb-791b-4159-8a0f-6c601196f873%3a16e4e7cd%3af7645509%3a24ee1bba%3af46195d3%3a19620875%3a874f8ea2%3a490a9d4e%3abd8f85e4%3bAjaxControlToolkit%2c+Version%3d3.0.20820.16598%2c+Culture%3dneutral%2c+PublicKeyToken%3d28f01b0e84b6d53e%3aen-US%3a707835dd-fa4b-41d1-89e7-6df5d518ffb5%3ab14bb7d5%3a13f47f54%3a369ef9d0%3a1d056c78%3adc2d6e36%3a5acd2e8e%3af8a45328] line=[997] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:56 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
PULLING LINKS:
Jul 2, 2013 6:19:56 PM com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl runSingleJob
SEVERE: Job run failed with unexpected RuntimeException: Exception invoking setOuterHTML
======= EXCEPTION START ========
Exception class=[java.lang.RuntimeException]
com.gargoylesoftware.htmlunit.ScriptException: Exception invoking setOuterHTML
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:663)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:559)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:525)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:594)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:569)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:996)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:53)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:101)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:328)
    at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:161)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.RuntimeException: Exception invoking setOuterHTML
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:163)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.setValue(ScriptableObject.java:287)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$RelinkedSlot.setValue(ScriptableObject.java:359)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putImpl(ScriptableObject.java:2659)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.put(ScriptableObject.java:509)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putProperty(ScriptableObject.java:2364)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1601)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1595)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1248)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:815)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:109)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:415)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:274)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3132)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:107)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:587)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:651)
    ... 10 more
Caused by: java.lang.IllegalStateException: Previous sibling for HtmlDivision[<div style="height: 0px; overflow: hidden; border-top: solid black; border-top-width: thick;">] is null.
    at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1023)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement$ProxyDomNode.appendChild(HTMLElement.java:1091)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.handleCharacters(HTMLParser.java:710)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endDocument(HTMLParser.java:718)
    at org.apache.xerces.parsers.AbstractSAXParser.endDocument(Unknown Source)
    at org.cyberneko.html.HTMLTagBalancer.endDocument(HTMLTagBalancer.java:510)
    at org.cyberneko.html.filters.DefaultFilter.endDocument(DefaultFilter.java:213)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2116)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:818)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:162)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:121)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.parseHtmlSnippet(HTMLElement.java:1048)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1035)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:137)
    ... 26 more
Enclosed exception: 
java.lang.RuntimeException: Exception invoking setOuterHTML
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:163)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.setValue(ScriptableObject.java:287)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$RelinkedSlot.setValue(ScriptableObject.java:359)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putImpl(ScriptableObject.java:2659)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.put(ScriptableObject.java:509)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putProperty(ScriptableObject.java:2364)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1601)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1595)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1248)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:815)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:109)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:415)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:274)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3132)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:107)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:587)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:651)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:559)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:525)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:594)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:569)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:996)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:53)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:101)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:328)
    at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:161)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.IllegalStateException: Previous sibling for HtmlDivision[<div style="height: 0px; overflow: hidden; border-top: solid black; border-top-width: thick;">] is null.
    at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1023)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement$ProxyDomNode.appendChild(HTMLElement.java:1091)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.handleCharacters(HTMLParser.java:710)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endDocument(HTMLParser.java:718)
    at org.apache.xerces.parsers.AbstractSAXParser.endDocument(Unknown Source)
    at org.cyberneko.html.HTMLTagBalancer.endDocument(HTMLTagBalancer.java:510)
    at org.cyberneko.html.filters.DefaultFilter.endDocument(DefaultFilter.java:213)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2116)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:818)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:162)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:121)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.parseHtmlSnippet(HTMLElement.java:1048)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1035)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:137)
    ... 26 more
== CALLING JAVASCRIPT ==

  function () {
      return b.apply(a, arguments);
  }

======= EXCEPTION END ========
COMPLETE
like image 440
Jeff Avatar asked Jul 02 '13 22:07

Jeff


1 Answers

The error comes from a MicrosoftAjax.js file. Try simulating chrome:

final WebClient webclient = new WebClient(BrowserVersion.CHROME);

Also added a link to suppress HtmlUnit warnings.

Also, your XPath doesn't find anything (I tested in Chrome). I used another for example purposes:

import java.util.List;

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class ScrapperApp {

    private static void go() throws Exception {
        /* turn off annoying htmlunit warnings */
        java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);

        HtmlPage nextPage;
        String url = "http://media.ethics.ga.gov/search/Campaign/Campaign_Name.aspx?NameID=5751&FilerID=C2009000085&Type=candidate";

        final WebClient webclient = new WebClient(BrowserVersion.CHROME);
        final HtmlPage page = webclient.getPage(url);

        System.out.println("PULLING LINKS:");

        List<HtmlAnchor> articles = (List<HtmlAnchor>) page.getByXPath("//a[@class='lblentrylink']");
        //List<HtmlAnchor> articles = (List<HtmlAnchor>) page.getByXPath("//div[@class='hform1']/a[@class='lblentrylink']");

        for(int x=0; x<articles.size(); x++) {
            System.out.println("Clicking "+articles.get(x).asText());
            //nextPage = articles.get(x).click();
            //System.out.println(nextPage.getBody());
        }
    }
    public static void main(String[] args) throws Exception {
        go();
        System.out.println("COMPLETE");
    }
}
like image 170
acdcjunior Avatar answered Oct 16 '22 18:10

acdcjunior