Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would cause HtmlUnit to throw 40+ Exceptions and fail when I try to load a page with javascript?

Tags:

java

htmlunit

I'm attempting to scrape the practice quizzes from my History Book's website so that I can assemble them all into one big test to study.

The page is here.

It is all driven by javascript, so I'm attempting to use HtmlUnit to scrape the page.

I'm more of a Python guy, so I set up my initial code to pretty closely mirror HtmlUnit's Getting Started section:

import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class HelloWorld {
    public static void main(String[] args) throws Exception {
        homePage();
        System.out.println("Done.");
    }
    
    public static void homePage() throws Exception {
        final WebClient webClient = new WebClient();
    
        String url = "http://www.wwnorton.com/college/polisci/we-the-people8/shorter/ch/15/quiz.aspx";
        
        final HtmlPage page = webClient.getPage(url);

        System.out.println(page.asText());
        webClient.closeAllWindows();
    }
}

Upon running, I get the following print out: Apr 27, 2013 12:50:16 PM

com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.wwnorton.com/common/js/shadowbox/shadowbox.js] line=[8] lineSource=[null] lineOffset=[0]
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'QuickTime.QuickTime'.
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'QuickTime.QuickTime'.] sourceName=[http://www.wwnorton.com/common/js/shadowbox/shadowbox.js] line=[8] lineSource=[null] lineOffset=[0]
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'wmplayer.ocx'.
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'wmplayer.ocx'.] sourceName=[http://www.wwnorton.com/common/js/shadowbox/shadowbox.js] line=[8] lineSource=[null] lineOffset=[0]
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://www.wwnorton.com/common/js/shadowbox/shadowbox.js] line=[8] lineSource=[null] lineOffset=[0]
Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://www.wwnorton.com/college/polisci/we-the-people8/shorter/ScriptResource.axd?d=_NFxbFNZD5BQbaNY82unni-tpvKHIpx_DFI8m05N9H4ZnCF8k_zg2bVneHOOjAQ58itL8--3tACMpiC67WkR4iVWW6J5oqm5iyilArcp1bA4Jl6UUf2tHGjuNfP4BmYCDciCRxCM3FV_f5qrDZM2IQ2&t=fffffffff9cbe881] line=[164] lineSource=[null] lineOffset=[0]
Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument execCommand
WARNING: Nothing done for execCommand(BackgroundImageCache, ...) (feature not implemented)
Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.wwnorton.com/common/css/ss2.0/base.css' [848:2] Error in style rule. (Invalid token "!important". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.wwnorton.com/common/css/ss2.0/base.css' [848:2] Ignoring the following declarations in this rule.
Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.wwnorton.com/common/css/min/reset-min.css' [57:1] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.wwnorton.com/common/css/min/reset-min.css' [57:1] Ignoring the following declarations in this rule.
Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.wwnorton.com/common/css/min/base-min.css' [8:580] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.wwnorton.com/common/css/min/base-min.css' [8:580] Ignoring the following declarations in this rule.
Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.wwnorton.com/common/css/min/fonts-min.css' [8:55] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.wwnorton.com/common/css/min/fonts-min.css' [8:55] Ignoring the following declarations in this rule.
Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.wwnorton.com/common/css/min/fonts-min.css' [8:237] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.wwnorton.com/common/css/min/fonts-min.css' [8:237] Ignoring the following declarations in this rule.
Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.wwnorton.com/college/polisci/we-the-people8/shorter/css/custom.css' [23:1] Error in style rule. (Invalid token "body". Was expecting one of: <S>, <LBRACE>, <COMMA>.)
Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.wwnorton.com/college/polisci/we-the-people8/shorter/css/custom.css' [23:1] Ignoring the following declarations in this rule.
Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.wwnorton.com/common/colorblindcss/ss2.0/style.css' [1:51] Error in style rule. (Invalid token "<EOF>". Was expecting one of: <S>, <LBRACE>, <COMMA>, <HASH>, ".", ":", "[", <S>.)
Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.wwnorton.com/common/colorblindcss/ss2.0/style.css' [1:51] Ignoring the following declarations in this rule.
Exception in thread "main" ======= EXCEPTION START ========
Exception class=[java.lang.RuntimeException]
com.gargoylesoftware.htmlunit.ScriptException: Exception invoking setOuterHTML
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:669)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:601)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:507)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:555)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1082)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:399)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:260)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:276)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:676)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:635)
    at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
    at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
    at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
    at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3074)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2041)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:892)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:241)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:187)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:434)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:309)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:374)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:359)
    at HelloWorld.homePage(HelloWorld.java:16)
    at HelloWorld.main(HelloWorld.java:7)
Caused by: java.lang.RuntimeException: Exception invoking setOuterHTML
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:148)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.setValue(ScriptableObject.java:295)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$RelinkedSlot.setValue(ScriptableObject.java:368)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putImpl(ScriptableObject.java:2796)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.put(ScriptableObject.java:521)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putProperty(ScriptableObject.java:2479)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1569)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1564)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1253)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:405)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:275)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3031)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:546)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:654)
    ... 31 more
Caused by: java.lang.IllegalStateException: Previous sibling for HtmlDivision[<div style="height: 0px; overflow: hidden; border-top: solid black; border-top-width: thick;">] is null.
    at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1036)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:120)
    ... 47 more
Enclosed exception: 
java.lang.RuntimeException: Exception invoking setOuterHTML
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:148)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.setValue(ScriptableObject.java:295)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$RelinkedSlot.setValue(ScriptableObject.java:368)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putImpl(ScriptableObject.java:2796)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.put(ScriptableObject.java:521)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putProperty(ScriptableObject.java:2479)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1569)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1564)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1253)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:405)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:275)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3031)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:546)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:654)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:601)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:507)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:555)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1082)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:399)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:260)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:276)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:676)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:635)
    at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
    at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
    at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
    at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3074)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2041)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:892)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:241)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:187)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:434)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:309)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:374)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:359)
    at HelloWorld.homePage(HelloWorld.java:16)
    at HelloWorld.main(HelloWorld.java:7)
Caused by: java.lang.IllegalStateException: Previous sibling for HtmlDivision[<div style="height: 0px; overflow: hidden; border-top: solid black; border-top-width: thick;">] is null.
    at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1036)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:120)
    ... 47 more
======= EXCEPTION END ========

What's causing all of these exceptions? Why am I (seemingly) not able to open the page?

Edit

I tried loading another page that has javascript driven menus and got similar output.

I then tried simply loading Yahoo.com.

String url = "http://www.yahoo.com";
        
final HtmlPage page = webClient.getPage(url);
        
System.out.println(page.asText());

No errors this time, but page.asText() returns nothing..

like image 838
Zack Yoshyaro Avatar asked Mar 24 '23 04:03

Zack Yoshyaro


1 Answers

The page you linked to uses a piece of Javascript that either triggers a bug in HtmlUnit (didn't find any open issue, though) or is just plain broken for non-Webkit browsers (page works fine in IE10, though).

Somewhere in this script, some elements' outerHTML properties are set to null unless the browser is "not Safari"(*):

if (!$telerik.isSafari) {
  c.outerHTML = null;
}
if (!$telerik.isSafari) {
  a.outerHTML = null;
}

(*) where "Safari" is revealed further downto actually mean "Webkit-based":

$telerik.isChrome = Sys.Browser.agent == Sys.Browser.Chrome;
$telerik.isSafari4 = Sys.Browser.agent == Sys.Browser.WebKit && Sys.Browser.version >= 526;
$telerik.isSafari3 = Sys.Browser.agent == Sys.Browser.WebKit && Sys.Browser.version < 526 && Sys.Browser.version > 500;
$telerik.isSafari2 = Sys.Browser.agent == Sys.Browser.Safari;
$telerik.isSafari = $telerik.isSafari2 || $telerik.isSafari3 || $telerik.isSafari4 || $telerik.isChrome;

I didn't bother to figure out what the obfuscated JS code wants to do with outerHTML, but it does fail a state check deep down in HtmlUnit's DOM manipulation code when executing the script:

java.lang.RuntimeException: Exception invoking setOuterHTML
  at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:669)
  ...
Caused by: java.lang.IllegalStateException: Previous sibling for
    HtmlDivision[<div style="...">] is null.
  at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1036)
  at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1067)
  at ...

Telling HtmlUnit to identify as Chrome avoids both outerHTML = null assignments

WebClient client = new WebClient(BrowserVersion.CHROME);
...

and produces sensible looking output:

Chapter 15: The Federal Courts | We the People, 8e: W. W. Norton StudySpace
W.W. Norton & Company
Colorblind Mode: On | Off W. W. NORTON HOME | HELP | CREDITS
We the People, 8e: A W. W. Norton StudySpace
...

This was a fun thing to track down.

like image 147
Philipp Reichart Avatar answered Apr 14 '23 21:04

Philipp Reichart