Compare Direct and Non-Direct ByteBuffer get/put operations

Question

Is get/put from a non-direct bytebuffer faster than get/put from direct bytebuffer ?

If I have to read / write from direct bytebuffer , is it better to first read /write in to a thread local byte array and then update ( for writes ) the direct bytebuffer fully with the byte array ?

Peter Lawrey · Accepted Answer

Is get/put from a non-direct bytebuffer faster than get/put from direct bytebuffer ?

If you are comparing heap buffer with direct buffer which does not use native byte order (most systems are little endian and the default for direct ByteBuffer is big endian), the performance is very similar.

If you use native ordered byte buffers the performance can be significantly better for multi-byte values. For byte it makes little difference no matter what you do.

In HotSpot/OpenJDK, ByteBuffer uses the Unsafe class and many of the native methods are treated as intrinsics. This is JVM dependent and AFAIK the Android VM treats it as an intrinsic in recent versions.

If you dump the assembly generated you can see the intrinsics in Unsafe are turned in one machine code instruction. i.e. they don't have the overhead of a JNI call.

In fact if you are into micro-tuning you may find that most of the time of a ByteBuffer getXxxx or setXxxx is spend in the bounds checking, not the actual memory access. For this reason I still use Unsafe directly when I have to for maximum performance (Note: this is discouraged by Oracle)

If I have to read / write from direct bytebuffer , is it better to first read /write in to a thread local byte array and then update ( for writes ) the direct bytebuffer fully with the byte array ?

I would hate to see what that is better than. ;) It sounds very complicated.

Often the simplest solutions are better and faster.

You can test this yourself with this code.

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    bb1.clear();
    bb2.clear();
    long start = System.nanoTime();
    int count = 0;
    while (bb2.remaining() > 0)
        bb2.putInt(bb1.getInt());
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

prints

Each putInt/getInt took an average of 83.9 ns
Each putInt/getInt took an average of 1.4 ns
Each putInt/getInt took an average of 34.7 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns

I am pretty sure a JNI call takes longer than 1.2 ns.

To demonstrate that its not the "JNI" call but the guff around it which causes the delay. You can write the same loop using Unsafe directly.

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    Unsafe unsafe = getTheUnsafe();
    long start = System.nanoTime();
    long addr1 = ((DirectBuffer) bb1).address();
    long addr2 = ((DirectBuffer) bb2).address();
    for (int i = 0, len = Math.min(bb1.capacity(), bb2.capacity()); i < len; i += 4)
        unsafe.putInt(addr1 + i, unsafe.getInt(addr2 + i));
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

public static Unsafe getTheUnsafe() {
    try {
        Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
        theUnsafe.setAccessible(true);
        return (Unsafe) theUnsafe.get(null);
    } catch (Exception e) {
        throw new AssertionError(e);
    }
}

prints

Each putInt/getInt took an average of 40.4 ns
Each putInt/getInt took an average of 44.4 ns
Each putInt/getInt took an average of 0.4 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns

So you can see that the native call is much faster than you might expect for a JNI call. The main reason for this delay could be the L2 cache speed. ;)

All run on an i3 3.3 GHz

user207421 · Answer

A direct buffer holds the data in JNI land, so get() and put() have to cross the JNI boundary. A non-direct buffer holds the data in JVM land.

So:

If you aren't playing with the data at all in Java land, e.g. just copying a channel to another channel, direct buffers are faster, as the data never has to cross the JNI boundary at all.
Conversely, if you are playing with the data in Java land, a non-direct buffer will be faster. Whether its significant depends on how much data has to cross the JNI boundary and also on what quanta are transferred each time. For example, getting or putting a single byte at a time from/to a direct buffer could get very expensive, where getting/putting 16384 bytes at a time would amortize the JNI boundary cost considerably.

To answer your second paragraph, I would use a local byte[] array, not a thread-local, but then if I was playing with the data in Java land I wouldn't use a direct byte buffer at all. As the Javadoc says, direct byte buffers should only be used where they deliver a measurable performance benefit.

Compare Direct and Non-Direct ByteBuffer get/put operations

Tags:

java

memory

nio

bytebuffer

user882659

2 Answers

Peter Lawrey

user207421

Recent Activity

Donate For Us

Compare Direct and Non-Direct ByteBuffer get/put operations

Tags:

java

memory

nio

bytebuffer

user882659

2 Answers

Peter Lawrey

user207421

Related questions

Recent Activity

Donate For Us