I'm creating a small app to measure how long it takes an HTML document to load, checking every x number of seconds.
I'm using jsoup in a loop:
Connection.Response response = null;
for (int i = 0; i < totalGets; i++) {
long startTime = System.currentTimeMillis();
try {
response = Jsoup.connect(url)
.userAgent(USER_AGENT) //just using a Firefox user-agent
.timeout(30_000)
.execute();
} catch (IOException e) {
if (e.getMessage().contains("connect timed out")) {
System.out.println("Request timed out after 30 seconds!");
}
}
long currentTime = System.currentTimeMillis();
System.out.println("Response time: " + (currentTime - startTime) + "ms" + "\tResponse code: " + response.statusCode());
sleep(2000);
}
The issue I'm having is that the very first execution of the jsoup connection is always slower than all subsequent once, no matter what website.
Here is my output on https://www.google.com
Response time: 934ms Response code: 200
Response time: 149ms Response code: 200
Response time: 122ms Response code: 200
Response time: 136ms Response code: 200
Response time: 128ms Response code: 200
Here is what I get on http://stackoverflow.com
Response time: 440ms Response code: 200
Response time: 182ms Response code: 200
Response time: 187ms Response code: 200
Response time: 193ms Response code: 200
Response time: 185ms Response code: 200
Why is it always faster after the first connect? Is there a better way to determine the document's load speed?
1. Jsoup must run some boiler plate code before the first request can be fired. I would not count the first request into your measurements, since all that initialization will skew the first request time.
2.
As mentioned in the comments, many websites cache responses for a couple of seconds. Depending on the website you want to measure you can use some tricks to get the webserver to produce a fresh site each time. Such a trick could be to add a timestamp parameter. Usually _
is used for that (like http://url/path/?pameter1=val1&_=ts). Or you could send along no cache headers in the HTTP request. however, none of these tricks can force a webserver to behave the way you want it. So you can wait longer than 30 seconds in between each request.
I think that in addition to @luksch points there is another factor, I think Java is keeping connection alive for a few seconds, maybe saving time in protocol trips.
If you use .header("Connection", "close")
you'll see more consistent times.
You can check that connections are kept alive with a sniffer. At least I can see port numbers (I mean source port, of course) reused.
EDIT:
Another thing that may add time to first request is DNS lookup ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With