Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can JVM jitter get caused by a for loop with no object allocations?

I have been micro benchmarking the following code and I noticed something interesting that I am hoping somebody can shed more light on. It leads to a situation where it looks like a for loop can carry on running fast while blocking other threads within the JVM. If that is true then I would like to understand why, if it is not true then any insight into what I may be missing would be appreciated.

To build up the situation, let me walk you through the benchmark that I am running and its results.

The code is pretty simple, iterate over every element in an array, summing its contents. Repeat 'targetCount' times.

public class UncontendedByteArrayReadBM extends Benchmark {

private int arraySize;
private byte[] array;

public UncontendedByteArrayReadBM( int arraySize ) {
    super( "array reads" );

    this.arraySize = arraySize;
}

@Override
public void setUp() {
    super.setUp();

    array = new byte[arraySize];
}

@Override
public void tearDown() {
    array = null;
}

@Override
public BenchmarkResult invoke( int targetCount ) {
    long sum = 0;
    for ( int i=0; i<targetCount; i++ ) {
        for ( int j=0; j<arraySize; j++ ) {
            sum += array[j];
        }
    }

    return new BenchmarkResult( ((long)targetCount)*arraySize, "uncontended byte array reads", sum );
}

}

On my 4 core - 2Ghz Intel Sandy Bridged/i7 laptop, running Java 6 (Oracle JVM) on OSX. This code repeatedly runs to

2.626852686364034 uncontended byte array reads/ns [totalTestRun=3806.837ms]

(I have culled the repeated runs used to warm the JVM)

This result appears reasonable to me.

Where this got interesting is when I started measuring JVM Jitter. To do that I start a background daemon thread that sleeps for 1ms and then figures out how much longer than 1ms it was really asleep for. And I changed the report to print out the max jitter for each repeated test run.

2.6109858273078306 uncontended byte array reads/ns [maxJitter=0.411ms totalTestRun=3829.971ms]

To get an idea of 'normal' jitter for my environment, before starting the actual test runs I monitor the jitter with no work going on and readings such as the following are typical (all in ms). Thus a jitter of 0.411ms is normal and not so interesting.

getMaxJitterMillis() = 0.599
getMaxJitterMillis() = 0.37
getMaxJitterMillis() = 0.352

I have included the code for how I measured Jitter at the end of this question.

The interesting part however, and yes it does happen during the 'JVM warmup' period and is thus not 'normal' but I would like to understand in more detail is the following:

2.4519521584902644 uncontended byte array reads/ns  [maxJitter=2561.222ms totalTestRun=4078.383ms]

Notice that the jitter is over 2.5 seconds. Usually I would put this down to GC. However I did trigger a System.gc() before the test runs, AND -XX:+PrintGCDetails shows no GC at this time. Infact there is no GC during any of the test runs as next to no Object allocation is happening in this test of summing up pre-allocated bytes. It also happens every time I run a new test, and thus I am not suspecting it to be interference from some other process that happens randomly.

My curiousity rocketed because when I noticed that while the jitter is very high the total runtime, and indeed the number of array element reads per nanosecond remained more or less unchanged. So here is a situation where a thread is getting heavily lagged on a 4 core machine while the work thread itself is not getting lagged and there is no GC going on.

Investigating further I looked into what the Hotspot compiler was doing and found the following via -XX:+PrintCompilation:

2632   2%      com.mosaic.benchmark.datastructures.array.UncontendedByteArrayReadBM::invoke @ 14 (65 bytes)
6709   2%     made not entrant  com.mosaic.benchmark.datastructures.array.UncontendedByteArrayReadBM::invoke @ -2 (65 bytes)

The lag between these two lines being printed out was about 2.5 seconds. Right when the method that contains the big for loops has had its optimised code marked as not being entrant any more.

My understanding was that Hotspot runs on a background thread, and when it is ready to swap in a new version of the code it waits for that code that is already running to reach a safe point and then it is swapped in. In the case of a big for-loop that is at the end of each loop body (which may have been unrolled some). I would not expect a 2.5s delay unless this swap out had to perform a stop-the-world event across the JVM. Does it do that when de-optimising previous compiled code?

So my first question for the JVM internal experts is, am I on the right track here? Could the 2.5s delay be due to marking the method as 'made not entrant'; and if so, why does it have such an extreme impact on other threads? If that is not likely to be the cause, then any ideas on what else to investigate would be great.

(for completeness, here is the code that I use for measuring the Jitter)

private static class MeasureJitter extends Thread {
    private AtomicLong maxJitterWitnessedNS = new AtomicLong(0);

    public MeasureJitter() {
        setDaemon( true );
    }

    public void reset() {
        maxJitterWitnessedNS.set( 0 );
    }

    public double getMaxJitterMillis() {
        return maxJitterWitnessedNS.get()/1000000.0;
    }

    public void printMaxJitterMillis() {
        System.out.println( "getMaxJitterMillis() = " + getMaxJitterMillis() );
    }

    @Override
    public void run() {
        super.run();

        long preSleepNS = System.nanoTime();
        while( true ) {
            try {
                Thread.sleep( 1 );
            } catch (InterruptedException e) {
                e.printStackTrace();
            }

            long wakeupNS = System.nanoTime();
            long jitterNS = Math.max(0, wakeupNS - (preSleepNS+1000000));

            long max = Math.max( maxJitterWitnessedNS.get(), jitterNS );
            maxJitterWitnessedNS.lazySet( max );

            preSleepNS = wakeupNS;
        }
    }
}
like image 634
Chris K Avatar asked Feb 16 '23 20:02

Chris K


2 Answers

This took some digging to find the smoking gun, but the lessons have been valuable; especially on how to prove and isolate the cause. So I thought it good to document them here.

The JVM was indeed waiting to perform a Stop The World event. Alexey Ragozin has a very good blog post about this topic at http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html and this was the post that got me onto the right track. He points out that safe points are on JNI method boundaries and Java method calls. Thus the for loop that I have here has no safe points in it.

To understand stop the world events in Java use the following JVM flags: -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1

The first prints out the total duration of a stop the world event, and it is not limited to just GC. In my case here that printed out:

Total time for which application threads were stopped: 2.5880809 seconds

Which proved that I had a problem with threads waiting to reach a safe point. The next two arguments print out why the JVM wanted to wait for a global safe point to be reached.

         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
4.144: EnableBiasedLocking              [      10          1              1    ]      [  2678     0  2678     0     0    ]  0   
Total time for which application threads were stopped: 2.6788891 seconds

So this says that the JVM waited 2678ms while trying to enable Biased Locking. Why is this a stop the world event? Thankfully Martin Thompson has also hit this problem in the past and he has documented it here. It turns out that the Oracle JVM has quite a lot of thread contention during startup, during that time biased locking has a very high cost so it delays turning the optimisation on for four seconds. So what happened here was that my micro benchmark went over four seconds and then its loop had no safe points in it. So when the JVM tried to turn Biased Locking on it had to wait.

Candidate solutions that all worked for me were:

  1. -XX:-UseBiasedLocking (turn biased locking off)
  2. -XX:BiasedLockingStartupDelay=0 (enable biased locking immediately)
  3. Change the loop to have a safe point within it (eg a method call that does not get optimised out or inlined)
like image 162
Chris K Avatar answered Feb 19 '23 09:02

Chris K


There are many causes of jitter

  • sleep is very unreliable at the milli-second level.
  • context switches
  • interrupts
  • cache misses due to other programs running

Even if you busy wait, bind the thread so to a cpu which has been isolated, e.g. with isocpus, and move all the interrupts you can off that cpu, you can still see small amounts of jitter. All you can do is reduce it.

BTW: jHiccup does exactly what you are doing to measure the jitter of your system.

like image 41
Peter Lawrey Avatar answered Feb 19 '23 11:02

Peter Lawrey