Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip particular Javascript execution in HTML unit

Tags:

htmlunit

I have a URL. I want to fetch Page-Source of the URL after executing Java Scripts.

Fetch Page source using HtmlUnit : URL got stuck

Initially I suspected that it is due to system resource and High CPU usage, that the URL is getting stuck.

Then I tried to run it on HTML UNIT 2.9 and 2.11. It got stuck on both while parsing. Refer the above question for HTML UNIT code scrape that is getting stuck.

Now I am suspecting that this might be due to JS Execution going into infinite loop.

I want to check what JS files are causing problem and remove them from execution.

If they are JS for sites like google analytics, twitter etc, I may not need them at all.

So I want to find a way to tell HTML Unit to ignore certain JS file and execute the rest.

Does anybody know how to do that ?

like image 365
Learn More Avatar asked Jan 21 '13 13:01

Learn More


1 Answers

Try this. It worked for me:

class InterceptWebConnection extends FalsifyingWebConnection{
    public InterceptWebConnection(WebClient webClient) throws IllegalArgumentException{
        super(webClient);
    }
    @Override
    public WebResponse getResponse(WebRequest request) throws IOException {
        WebResponse response=super.getResponse(request);
        if(response.getWebRequest().getUrl().toString().endsWith("dom-drag.js")){
            return createWebResponse(response.getWebRequest(), "", "application/javascript", 200, "Ok");
        }
        return super.getResponse(request);
    }
}

then write following while setting up your webClient

new InterceptWebConnection(webClient);
like image 63
Kunal Kishore Avatar answered Nov 17 '22 20:11

Kunal Kishore