The attached program (see at the end), when executed, yields the following output:
..........
with sleep time of 0ms
times= [1, 1, 1, 0, 1, 1, 0, 1, 1, 0]
average= 0.7
..........
with sleep time of 2000ms
times= [2, 2, 2, 2, 2, 1, 2, 2, 2, 2]
average= 1.9
In both cases the exact same code is executed which is to repeatedly get the next value from a Random object instantiated which at the start of the program. The warm up method executed first is supposed to trigger any sort of JIT otimizations before the actual testing begins.
Can anyone explain the reason for this difference? I have been able to repeat this result in my machine every time so far, and this was executed on a multi-core Windows system with java 7.
One interesting thing is that if the order in which the tests are executed is reversed, that is, if we run the loop with the delay before the loop without the delay, then the timings are more similar (with the no delay loop actually taking longer):
..........
with sleep time of 2000ms
times= [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
average= 2.0
..........
with sleep time of 0ms
times= [2, 3, 3, 2, 3, 3, 2, 3, 2, 3]
average= 2.6
As much as I could tell, no object is being created inside the operation method, and when running this through a profiler it does not seem that garbage collection is ever triggered. A wild guess is that some value gets cached in a processor-local cache which gets flushed out when the thread is put to sleep and then when the thread wakes up it needs to retrieve the value from main memory, but that is not so fast. That however does not explain why inverting the order makes a difference...
The real-life situation where I initially observed this behavior (which prompted me to write this sample test class) was XML unmarshalling, where I noticed that unmarshalling the same document repeated times one after the other in quick succession yielded better times than performing the same thing but with a delay between calls to unmarshal (delay generated through sleep or manually).
Here is the code:
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
public class Tester
{
public static void main(String[] args) throws InterruptedException
{
warmUp(10000);
int numRepetitions = 10;
runOperationInALoop(numRepetitions, 0);
runOperationInALoop(numRepetitions, 2000);
}
private static void runOperationInALoop(int numRepetitions, int sleepTime) throws InterruptedException
{
List<Long> times = new ArrayList<Long>(numRepetitions);
long totalDuration = 0;
for(int i=0; i<numRepetitions; i++)
{
Thread.sleep(sleepTime);
long before = System.currentTimeMillis();
someOperation();
long duration = System.currentTimeMillis() - before;
times.add(duration);
totalDuration = totalDuration + duration;
System.out.print(".");
}
System.out.println();
double averageTimePerOperation = totalDuration/(double)numRepetitions;
System.out.println("with sleep time of " + sleepTime + "ms");
System.out.println(" times= " + times);
System.out.println(" average= " + averageTimePerOperation);
}
private static void warmUp(int warmUpRepetitions)
{
for(int i=0; i<warmUpRepetitions; i++)
{
someOperation();
}
}
public static int someInt;
public static Random random = new Random(123456789L);
private static void someOperation()
{
for(int j=0; j<50000; j++)
{
someInt = ((int)random.nextInt()*10) + 1;
}
}
}
Collapsing and parallelizing both of the tiled loops also mean that the total number of iterations available to distribute among the threads is reduced by the tiling factor, which can be a problem for runs with small number of particles.
The problem with loops is that they can be a serious performance hindrance, however they are integral to many programming operations. No matter what you are programming, there are most certainly going to be multiple different ways to do it.
One such tool that is often used on data structures is looping. Looping of course can refer to while-looping, for-looping, and reluctantly; recursive-looping. The problem with loops is that they can be a serious performance hindrance, however they are integral to many programming operations.
Then, on the weekend you might make up the sleep debt by sleeping longer than normal. A number of health conditions can also lead to oversleeping and excessive daytime sleepiness: Sleep disorders, including sleep apnea, insomnia, and narcolepsy Sleep apnea causes you to stop breathing temporarily 10 during your sleep.
When you sleep for even a short period of time (you may find that 10 ms is long enough) you give up the CPU and the data, instruction and branch prediction caches are disturbed or even cleared. Even making a system call like System.currentTimeMillis() or the much more accurate System.nanoTime() can do this to a small degree.
AFAIK, The only way to avoid giving up the core is to busy wait and using thread affinity to lock your thread to a core. This prevent minimises such a disturbance and means your program can runs 2-5x faster in low latency situations i.e. when sub-millisecond tasks matter.
For your interest
http://vanillajava.blogspot.co.uk/2012/01/java-thread-affinity-support-for-hyper.html
http://vanillajava.blogspot.co.uk/2012/02/how-much-difference-can-thread-affinity.html
When you're thread goes to sleep you're essentially saying to the JVM: This thread is doing nothing for the next X milliseconds. The JVM is likely at that point to wake up various background threads to do their thing (GC, for example), which may well cause updates to data stored in the processor cache. When you're thread reawakes, some of its data may no longer be in the cache (fast), but may well be shifted out to main memory (slow).
Take a look at http://mechanical-sympathy.blogspot.co.uk/ for more discussion of low level caching effects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With