I have a URL. I want to fetch Page-Source of the URL after executing Java Scripts.
Fetch Page source using HtmlUnit : URL got stuck
Initially I suspected that it is due to system resource and High CPU usage, that the URL is getting stuck.
Then I tried to run it on HTML UNIT 2.9 and 2.11. It got stuck on both while parsing. Refer the above question for HTML UNIT code scrape that is getting stuck.
Now I am suspecting that this might be due to JS Execution going into infinite loop.
I want to check what JS files are causing problem and remove them from execution.
If they are JS for sites like google analytics, twitter etc, I may not need them at all.
So I want to find a way to tell HTML Unit to ignore certain JS file and execute the rest.
Does anybody know how to do that ?
Try this. It worked for me:
class InterceptWebConnection extends FalsifyingWebConnection{
public InterceptWebConnection(WebClient webClient) throws IllegalArgumentException{
super(webClient);
}
@Override
public WebResponse getResponse(WebRequest request) throws IOException {
WebResponse response=super.getResponse(request);
if(response.getWebRequest().getUrl().toString().endsWith("dom-drag.js")){
return createWebResponse(response.getWebRequest(), "", "application/javascript", 200, "Ok");
}
return super.getResponse(request);
}
}
then write following while setting up your webClient
new InterceptWebConnection(webClient);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With