I'm developing a data mining application in JavaFX which relies on the WebView (and thus also the WebEngine). The mining happens in 2 steps: first the user uses the UI to navigate to a website in the WebView to configure where interesting data can be searched. Second, using a background task that periodically runs, a WebEngine loads the same document and tries to extract the data from the loaded document.
This works perfectly for most cases but recently I've ran into some trouble with pages that use AJAX to render content. To check if the WebEngine has loaded the document, I listen to the loadWorker
's stateProperty
. If the state transitions to succesfull, I know the document is loaded (together with any javascript that might run on document.ready() or equivalent). This because javascript is executed on the JavaFX thread if I'm not mistaken (source: https://blogs.oracle.com/javafx/entry/communicating_between_javascript_and_javafx). However, if an AJAX call is started, the javascript execution finishes and the engine lets me know the document is ready though it is obviously not as the contents might still change due to the outstanding AJAX call.
Is there any way around this, to inject a hook so I am notified when AJAX calls are finished? I've tried installing a default complete handler in $.ajaxSetup()
but that is quite dodgy because if an ajax call overrides the complete handler, the default won't be called. Plus, I can only inject this after the document is first loaded (and by then some AJAX calls may already be running). I've tested this injection with an upcall and it works fine for AJAX calls that are launched on command (after the injection of the default handler) that don't supply their own complete handler.
I'm looking for two things: firstly: a generic way to hook into the completion handler of AJAX calls, and secondly: a way to wait for the WebEngine to finish all AJAX calls and notify me afterwards.
I've also had this problem and solved it by providing my own implementation of sun.net.www.protocol.http.HttpURLConnection
which I use to process any AJAX requests. My class, conveniently called AjaxHttpURLConnection
, hooks into the getInputStream()
function, but does not return its original input stream. Instead, I give an instance of PipedInputStream
back to the WebEngine
. I then read all the data coming from the original input stream and pass it on to my piped stream.
This way, I gain 2 benefits:
First, you will have to tell Java to use your URLConnection implementation instead of the default one. To do so, you must provide it with your own version of the URLStreamHandlerFactory
. You can find many threads here on SO (e.g. this one) or via Google on this topic. In order to set your factory instance, put the following somewhere early in your main
method. This is what mine looks like.
import java.net.URLStreamHandler;
import java.net.URLStreamHandlerFactory;
public class MyApplication extends Application {
// ...
public static void main(String[] args) {
URL.setURLStreamHandlerFactory(new URLStreamHandlerFactory() {
public URLStreamHandler createURLStreamHandler(String protocol) {
if ("http".equals(protocol)) {
return new MyUrlConnectionHandler();
}
return null; // Let the default handlers deal with whatever comes here (e.g. https, jar, ...)
}
});
launch(args);
}
}
Second, we have to come up with our own Handler
that tells the programme when to use which type of URLConnection
.
import java.io.IOException;
import java.net.Proxy;
import java.net.URL;
import java.net.URLConnection;
import sun.net.www.protocol.http.Handler;
import sun.net.www.protocol.http.HttpURLConnection;
public class MyUrlConnectionHandler extends Handler {
@Override
protected URLConnection openConnection(URL url, Proxy proxy) throws IOException {
if (url.toString().contains("ajax=1")) {
return new AjaxHttpURLConnection(url, proxy, this);
}
// Return a default HttpURLConnection instance.
return new HttpURLConnection(url, proxy);
}
}
Last but not least, here comes the AjaxHttpURLConnection
.
import java.io.IOException;
import java.io.InputStream;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.net.Proxy;
import java.net.URL;
import java.util.concurrent.locks.ReentrantLock;
import org.apache.commons.io.IOUtils;
import sun.net.www.protocol.http.Handler;
import sun.net.www.protocol.http.HttpURLConnection;
public class AjaxHttpURLConnection extends HttpURLConnection {
private PipedInputStream pipedIn;
private ReentrantLock lock;
protected AjaxHttpURLConnection(URL url, Proxy proxy, Handler handler) {
super(url, proxy, handler);
this.pipedIn = null;
this.lock = new ReentrantLock(true);
}
@Override
public InputStream getInputStream() throws IOException {
lock.lock();
try {
// Do we have to set up our own input stream?
if (pipedIn == null) {
PipedOutputStream pipedOut = new PipedOutputStream();
pipedIn = new PipedInputStream(pipedOut);
InputStream in = super.getInputStream();
/*
* Careful here! for some reason, the getInputStream method seems
* to be calling itself (no idea why). Therefore, if we haven't set
* pipedIn before calling super.getInputStream(), we will run into
* a loop or into EOFExceptions!
*/
// TODO: timeout?
new Thread(new Runnable() {
public void run() {
try {
// Pass the original data on to the browser.
byte[] data = IOUtils.toByteArray(in);
pipedOut.write(data);
pipedOut.flush();
pipedOut.close();
// Do something with the data? Decompress it if it was
// gzipped, for example.
// Signal that the browser has finished.
} catch (IOException e) {
e.printStackTrace();
}
}
}).start();
}
} finally {
lock.unlock();
}
return pipedIn;
}
}
WebEngine
objects, it might be tricky to tell which one actually opened the URLConnection
and thus which browser has finished loading.AjaxHttpURLConnection
is when the corresponding url contains ajax=1
. In my case, this was sufficient. Since I am not too good with html and http, however, I don't know if the WebEngine
can make AJAX requests in any different way (e.g. the header fields?). If in doubt, you could simply always return an instance of our modified url connection, but that would of course mean some overhead.WebEngine
sends in a similar way. Just wrap the getOutputStream()
function and place another intermediate stream to grab whatever is being sent and then pass it on to the original output stream.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With