Direct ByteBuffer relative vs absolute read performance

While I was testing the read performance of a direct java.nio.ByteBuffer I noticed that the absolute read is on average 2x times faster than the relative read. Also if I compare the source code of the relative vs absolute read, the code is pretty much the same except that the relative read maintains and internal counter. I wonder why do I see such a considerable difference in speed?

Below is the source code of my JMH benchmark:

public class DirectByteBufferReadBenchmark {

    private static final int OBJ_SIZE = 8 + 4 + 1;
    private static final int NUM_ELEM = 10_000_000;

    @State(Scope.Benchmark)
    public static class Data {

        private ByteBuffer directByteBuffer;

        @Setup
        public void setup() {
            directByteBuffer = ByteBuffer.allocateDirect(OBJ_SIZE * NUM_ELEM);
            for (int i = 0; i < NUM_ELEM; i++) {
                directByteBuffer.putLong(i);
                directByteBuffer.putInt(i);
                directByteBuffer.put((byte) (i & 1));
            }
        }
    }



    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @OutputTimeUnit(TimeUnit.SECONDS)
    public long testReadAbsolute(Data d) throws InterruptedException {
        long val = 0l;
        for (int i = 0; i < NUM_ELEM; i++) {
            int index = OBJ_SIZE * i;
            val += d.directByteBuffer.getLong(index);
            d.directByteBuffer.getInt(index + 8);
            d.directByteBuffer.get(index + 12);
        }
        return val;
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @OutputTimeUnit(TimeUnit.SECONDS)
    public long testReadRelative(Data d) throws InterruptedException {
        d.directByteBuffer.rewind();

        long val = 0l;
        for (int i = 0; i < NUM_ELEM; i++) {
            val += d.directByteBuffer.getLong();
            d.directByteBuffer.getInt();
            d.directByteBuffer.get();
        }

        return val;
    }

    public static void main(String[] args) throws Exception {
        Options opt = new OptionsBuilder()
            .include(DirectByteBufferReadBenchmark.class.getSimpleName())
            .warmupIterations(5)
            .measurementIterations(5)
            .forks(3)
            .threads(1)
            .build();

        new Runner(opt).run();
    }
}

And these are the results of my benchmark run:

Benchmark                                        Mode  Cnt   Score   Error  Units
DirectByteBufferReadBenchmark.testReadAbsolute  thrpt   15  88.605 ± 9.276  ops/s
DirectByteBufferReadBenchmark.testReadRelative  thrpt   15  42.904 ± 3.018  ops/s

The test was run on a MacbookPro (2.2GHz Intel Core i7, 16Gb DDR3) and JDK 1.8.0_73.

UPDATE

I run the same test with JDK 9-ea b134. Both test show a ~10% speed increase but the speed difference between the two remains similar.

# JMH 1.13 (released 45 days ago)
# VM version: JDK 9-ea, VM 9-ea+134
# VM invoker: /Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/bin/java
# VM options: <none>


Benchmark                                        Mode  Cnt    Score    Error  Units
DirectByteBufferReadBenchmark.testReadAbsolute  thrpt   15  102.170 ± 10.199  ops/s
DirectByteBufferReadBenchmark.testReadRelative  thrpt   15   45.988 ±  3.896  ops/s

What is a direct ByteBuffer?

A direct buffer is a chunk of native memory shared with Java from which you can perform a direct read. An instance of DirectByteBuffer can be created using the ByteBuffer.

What is ByteBuffer limit?

ByteBuffer limit() methods in Java with ExamplesThe limit() method of java. nio. ByteBuffer Class is used to set this buffer's limit. If the position is larger than the new limit then it is set to the new limit. If the mark is defined and larger than the new limit then it is discarded.

What does ByteBuffer flip do?

ByteBuffer flip() methods in Java with Examples After a sequence of channel-read or put operations, invoke this method to prepare for a sequence of channel-write or relative get operations. This method is often used in conjunction with the compact method when transferring data from one place to another.

What is Bytebuf?

ByteBuffer holds a sequence of integer values to be used in an I/O operation. The ByteBuffer class provides the following four categories of operations upon long buffers: Absolute and relative get method that read single bytes. Absolute and relative put methods that write single bytes.

JDK 8 indeed generates worse code for the loop with relative ByteBuffer access.

JMH has built-in perfasm profiler that prints generated assembly code for the hottest regions. I've used it to compare the compiled testReadAbsolute vs. testReadRelative, and here are the main differences:

Relative getLong / getInt/ get update position field of the ByteBuffer. VM does not optimize these updates: there are 3 memory writes on each loop iteration.
position range check is not eliminated: conditional branches on each loop iteration remained in compiled code.
Since redundant field updates and range checks make the loop body longer, VM unrolls only 2 iterations of the loop. The compiled version for the loop with absolute access has 16 iterations unrolled.

testReadAbsolute is compiled very well: the main loop just reads 16 longs, sums them up and jumps to the next iteration if index < 10_000_000 - 16. The state of directByteBuffer is not updated. However, JVM is not that smart for testReadRelative: seems like it cannot optimize field access of an object from outside.

There was much work in JDK 9 to optimize ByteBuffer. I've run the same test on JDK 9-ea b134, and verified that testReadRelative does not have redundant memory writes and range checks. Now it runs almost as fast as testReadAbsolute.

// JDK 1.8.0_92, VM 25.92-b14

Benchmark                                        Mode  Cnt   Score   Error  Units
DirectByteBufferReadBenchmark.testReadAbsolute  thrpt   10  99,727 ± 0,542  ops/s
DirectByteBufferReadBenchmark.testReadRelative  thrpt   10  47,126 ± 0,289  ops/s

// JDK 9-ea, VM 9-ea+134

Benchmark                                        Mode  Cnt    Score   Error  Units
DirectByteBufferReadBenchmark.testReadAbsolute  thrpt   10  109,369 ± 0,403  ops/s
DirectByteBufferReadBenchmark.testReadRelative  thrpt   10   97,140 ± 0,572  ops/s

UPDATE

In order to help JIT compiler with optimization I've introduced local variable

ByteBuffer directByteBuffer = d.directByteBuffer

in both benchmarks. Otherwise level of indirection does not allow compiler to eliminate ByteBuffer.position field updates.

Direct ByteBuffer relative vs absolute read performance

Tags:

java

performance

jvm

microbenchmark

jmh

Vladimir G.

People also ask

1 Answers

apangin

Recent Activity

Donate For Us

Direct ByteBuffer relative vs absolute read performance

Tags:

java

performance

jvm

microbenchmark

jmh

Vladimir G.

People also ask

1 Answers

apangin

Related questions

Recent Activity

Donate For Us