Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the Java Scheduler exhibit significant time drift on Windows?

Tags:

java

time

I have Java service running on Windows 7 that runs once per day on a SingleThreadScheduledExecutor. I've never given it much though as it's non critical but recently looked at the numbers and saw that the service was drifting approximately 15 minutes per day which sounds way to much so dug it up.

Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
   long drift = (System.currentTimeMillis() - lastTimeStamp - seconds * 1000);
   lastTimeStamp = System.currentTimeMillis();
}, 0, 10, TimeUnit.SECONDS);

This method pretty consistently drifts +110ms per each 10 seconds. If I run it on a 1 second interval the drift averages +11ms.

Interestingly if I do the same on a Timer() values are pretty consistent with an average drift less than a full millisecond.

new Timer().schedule(new TimerTask() {
    @Override
    public void run() {
        long drift = (System.currentTimeMillis() - lastTimeStamp - seconds * 1000);
        lastTimeStamp = System.currentTimeMillis();
    }
}, 0, seconds * 1000);

Linux: doesn't drift (nor with Executor, nor with Timer)
Windows: drifts like crazy with Executor, doesn't with Timer

Tested with Java8 and Java11.

Interestingly, if you assume a drift of 11ms per second you'll get 950400ms drift per day which amounts to 15.84 minutes per day. So it's pretty consistent.

The question is: why?
Why would this happen with a SingleThreadExecutor but not with a Timer.

Update1: following Slaw's comment I tried on multiple different hardware. What I found is that this issue doesn't manifest on any personal hardware. Only on the company one. On company hardware it also manifests on Win10, though an order of magnitude less.

like image 439
Frankie Avatar asked Jun 12 '19 23:06

Frankie


People also ask

What causes server time drift?

"Clock Drift" in this context is defined as the clock going out of sync. This is caused by Windows using SNTP (Simplified Network Time Protocol) rather than a full NTP service; as well as Windows having a too-infrequent clock update cycle by default.

How much do computer clocks drift?

Abstract. Most computers have several high-resolution timing sources, from the programmable interrupt timer to the cycle counter. Yet, even at a precision of one cycle in ten millions, clocks may drift significantly in a single second at a clock frequency of several GHz.


2 Answers

As pointed out in the comments, the ScheduledThreadPoolExecutor bases its calculations on System.nanoTime(). For better or worse, the old Timer API however preceeded nanoTime(), and so uses System.currentTimeMillis() instead.

The difference here might seem subtle, but is more significant than one might expect. Contrary to popular belief, nanoTime() is not just a "more accurate version" of currentTimeMillis(). Millis is locked to system time, whereas nanos is not. Or as the docs put it:

This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. [...] The values returned by this method become meaningful only when the difference between two such values, obtained within the same instance of a Java virtual machine, is computed.

In your example, you're not following this guidance for the values to be "meaningful" - understandably, because the ScheduledThreadPoolExecutor only uses nanoTime() as an implementation detail. But the end result is the same, that being that you can't guarantee that it will stay synchronised to the system clock.

But why not? Seconds are seconds, right, so the two should stay in sync from a certain, known point?

Well, in theory, yes. But in practice, probably not.

Taking a look at the relevant native code on windows:

LARGE_INTEGER current_count;
QueryPerformanceCounter(&current_count);
double current = as_long(current_count);
double freq = performance_frequency;
jlong time = (jlong)((current/freq) * NANOSECS_PER_SEC);
return time;

We see nanos() uses the QueryPerformanceCounter API, which works by QueryPerformanceCounter getting the "ticks" of a frequency that's defined by QueryPerformanceFrequency. That frequency will stay identical, but the timer it's based off, and its synchronistaion algorithm that windows uses, varies by configuration, OS, and underlying hardware. Even ignoring the above, it's never going to be close to 100% accurate (it's based of a reasonably cheap crystal oscillator somewhere on the board, not a Caesium time standard!) so it's going to drift out with the system time as NTP keeps it in sync with reality.

In particular, this link gives some useful background, and reinforces the above pont:

When you need time stamps with a resolution of 1 microsecond or better and you don't need the time stamps to be synchronized to an external time reference, choose QueryPerformanceCounter.

(Bolding is mine.)

For your specific case of Windows 7 performing badly, note that in Windows 8+, the TSC synchronisation algorithm was improved, and QueryPerformanceCounter was always based on a TSC (as oppose to Windows 7, where it could be a TSC, HPET or the ACPI PM timer - the latter of which is especially rather inaccurate.) I suspect this is the most likely reason the situation improves tremendously on Windows 10.

That being said, the above factors still mean that you can't rely on the ScheduledThreadPoolExecutor to keep in time with "real" time - it will always drift. If that drift is an issue, then it's not a solution you can rely on in this context.

Side note: In Windows 8+, there is a GetSystemTimePreciseAsFileTime function which offers the high resolution of QueryPerformanceCounter combined with the accuracy of the system time. If Windows 7 was dropped as a supported platform, this could in theory be used to provide a System.getCurrentTimeNanos() method or similar, assuming other similar native functions exist for other supported platforms.

like image 171
Michael Berry Avatar answered Oct 17 '22 03:10

Michael Berry


CronScheduler is a project of mine designed to be proof against time drift problem, and at the same time it avoids some of the problems with the old Timer class described in this post.

Example usage:

Duration syncPeriod = Duration.ofMinutes(1);
CronScheduler cron = CronScheduler.create(syncPeriod);
cron.scheduleAtFixedRateSkippingToLatest(0, 1, TimeUnit.MINUTES, runTimeMillis -> {
    // Collect and send summary metrics to a remote monitoring system
});

Note: this project was actually inspired by this StackOverflow question.

like image 32
leventov Avatar answered Oct 17 '22 03:10

leventov