Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare Direct and Non-Direct ByteBuffer get/put operations

Is get/put from a non-direct bytebuffer faster than get/put from direct bytebuffer ?

If I have to read / write from direct bytebuffer , is it better to first read /write in to a thread local byte array and then update ( for writes ) the direct bytebuffer fully with the byte array ?

like image 401
user882659 Avatar asked Jun 24 '12 01:06

user882659


2 Answers

Is get/put from a non-direct bytebuffer faster than get/put from direct bytebuffer ?

If you are comparing heap buffer with direct buffer which does not use native byte order (most systems are little endian and the default for direct ByteBuffer is big endian), the performance is very similar.

If you use native ordered byte buffers the performance can be significantly better for multi-byte values. For byte it makes little difference no matter what you do.

In HotSpot/OpenJDK, ByteBuffer uses the Unsafe class and many of the native methods are treated as intrinsics. This is JVM dependent and AFAIK the Android VM treats it as an intrinsic in recent versions.

If you dump the assembly generated you can see the intrinsics in Unsafe are turned in one machine code instruction. i.e. they don't have the overhead of a JNI call.

In fact if you are into micro-tuning you may find that most of the time of a ByteBuffer getXxxx or setXxxx is spend in the bounds checking, not the actual memory access. For this reason I still use Unsafe directly when I have to for maximum performance (Note: this is discouraged by Oracle)

If I have to read / write from direct bytebuffer , is it better to first read /write in to a thread local byte array and then update ( for writes ) the direct bytebuffer fully with the byte array ?

I would hate to see what that is better than. ;) It sounds very complicated.

Often the simplest solutions are better and faster.


You can test this yourself with this code.

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    bb1.clear();
    bb2.clear();
    long start = System.nanoTime();
    int count = 0;
    while (bb2.remaining() > 0)
        bb2.putInt(bb1.getInt());
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

prints

Each putInt/getInt took an average of 83.9 ns
Each putInt/getInt took an average of 1.4 ns
Each putInt/getInt took an average of 34.7 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns

I am pretty sure a JNI call takes longer than 1.2 ns.


To demonstrate that its not the "JNI" call but the guff around it which causes the delay. You can write the same loop using Unsafe directly.

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    Unsafe unsafe = getTheUnsafe();
    long start = System.nanoTime();
    long addr1 = ((DirectBuffer) bb1).address();
    long addr2 = ((DirectBuffer) bb2).address();
    for (int i = 0, len = Math.min(bb1.capacity(), bb2.capacity()); i < len; i += 4)
        unsafe.putInt(addr1 + i, unsafe.getInt(addr2 + i));
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

public static Unsafe getTheUnsafe() {
    try {
        Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
        theUnsafe.setAccessible(true);
        return (Unsafe) theUnsafe.get(null);
    } catch (Exception e) {
        throw new AssertionError(e);
    }
}

prints

Each putInt/getInt took an average of 40.4 ns
Each putInt/getInt took an average of 44.4 ns
Each putInt/getInt took an average of 0.4 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns

So you can see that the native call is much faster than you might expect for a JNI call. The main reason for this delay could be the L2 cache speed. ;)

All run on an i3 3.3 GHz

like image 175
Peter Lawrey Avatar answered Sep 18 '22 22:09

Peter Lawrey


A direct buffer holds the data in JNI land, so get() and put() have to cross the JNI boundary. A non-direct buffer holds the data in JVM land.

So:

  1. If you aren't playing with the data at all in Java land, e.g. just copying a channel to another channel, direct buffers are faster, as the data never has to cross the JNI boundary at all.

  2. Conversely, if you are playing with the data in Java land, a non-direct buffer will be faster. Whether its significant depends on how much data has to cross the JNI boundary and also on what quanta are transferred each time. For example, getting or putting a single byte at a time from/to a direct buffer could get very expensive, where getting/putting 16384 bytes at a time would amortize the JNI boundary cost considerably.

To answer your second paragraph, I would use a local byte[] array, not a thread-local, but then if I was playing with the data in Java land I wouldn't use a direct byte buffer at all. As the Javadoc says, direct byte buffers should only be used where they deliver a measurable performance benefit.

like image 44
user207421 Avatar answered Sep 19 '22 22:09

user207421