Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java ByteBuffer performance issue

While processing multiple gigabyte files I noticed something odd: it seems that reading from a file using a filechannel into a re-used ByteBuffer object allocated with allocateDirect is much slower than reading from a MappedByteBuffer, in fact it is even slower than reading into byte-arrays using regular read calls!

I was expecting it to be (almost) as fast as reading from mappedbytebuffers as my ByteBuffer is allocated with allocateDirect, hence the read should end-up directly in my bytebuffer without any intermediate copies.

My question now is: what is it that I'm doing wrong? Or is bytebuffer+filechannel really slowe r than regular io/mmap?

I the example code below I also added some code that converts what is read into long values, as that is what my real code constantly does. I would expect that the ByteBuffer getLong() method is much faster than my own byte shuffeler.

Test-results: mmap: 3.828 bytebuffer: 55.097 regular i/o: 38.175

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.MappedByteBuffer;

class testbb {
    static final int size = 536870904, n = size / 24;

    static public long byteArrayToLong(byte [] in, int offset) {
        return ((((((((long)(in[offset + 0] & 0xff) << 8) | (long)(in[offset + 1] & 0xff)) << 8 | (long)(in[offset + 2] & 0xff)) << 8 | (long)(in[offset + 3] & 0xff)) << 8 | (long)(in[offset + 4] & 0xff)) << 8 | (long)(in[offset + 5] & 0xff)) << 8 | (long)(in[offset + 6] & 0xff)) << 8 | (long)(in[offset + 7] & 0xff);
    }

    public static void main(String [] args) throws IOException {
        long start;
        RandomAccessFile fileHandle;
        FileChannel fileChannel;

        // create file
        fileHandle = new RandomAccessFile("file.dat", "rw");
        byte [] buffer = new byte[24];
        for(int index=0; index<n; index++)
            fileHandle.write(buffer);
        fileChannel = fileHandle.getChannel();

        // mmap()
        MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, size);
        byte [] buffer1 = new byte[24];
        start = System.currentTimeMillis();
        for(int index=0; index<n; index++) {
                mbb.position(index * 24);
                mbb.get(buffer1, 0, 24);
                long dummy1 = byteArrayToLong(buffer1, 0);
                long dummy2 = byteArrayToLong(buffer1, 8);
                long dummy3 = byteArrayToLong(buffer1, 16);
        }
        System.out.println("mmap: " + (System.currentTimeMillis() - start) / 1000.0);

        // bytebuffer
        ByteBuffer buffer2 = ByteBuffer.allocateDirect(24);
        start = System.currentTimeMillis();
        for(int index=0; index<n; index++) {
            buffer2.rewind();
            fileChannel.read(buffer2, index * 24);
            buffer2.rewind();   // need to rewind it to be able to use it
            long dummy1 = buffer2.getLong();
            long dummy2 = buffer2.getLong();
            long dummy3 = buffer2.getLong();
        }
        System.out.println("bytebuffer: " + (System.currentTimeMillis() - start) / 1000.0);

        // regular i/o
        byte [] buffer3 = new byte[24];
        start = System.currentTimeMillis();
        for(int index=0; index<n; index++) {
                fileHandle.seek(index * 24);
                fileHandle.read(buffer3);
                long dummy1 = byteArrayToLong(buffer1, 0);
                long dummy2 = byteArrayToLong(buffer1, 8);
                long dummy3 = byteArrayToLong(buffer1, 16);
        }
        System.out.println("regular i/o: " + (System.currentTimeMillis() - start) / 1000.0);
    }
}

As loading large sections and then processing is them is not an option (I'll be reading data all over the place) I think I should stick to a MappedByteBuffer. Thank you all for your suggestions.

like image 949
Folkert van Heusden Avatar asked Oct 12 '11 10:10

Folkert van Heusden


People also ask

What is ByteBuffer in Java?

ByteBuffer holds a sequence of integer values to be used in an I/O operation. The ByteBuffer class provides the following four categories of operations upon long buffers: Absolute and relative get method that read single bytes. Absolute and relative put methods that write single bytes.

What is the difference between ByteBuffer and byte array?

There are several differences between a byte array and ByteBuffer class in Java, but the most important of them is that bytes from byte array always reside in Java heap space, but bytes in a ByteBuffer may potentially reside outside of the Java heap in case of direct byte buffer and memory mapped files.

What does ByteBuffer wrap do?

wrap. Wraps a byte array into a buffer. The new buffer will be backed by the given byte array; that is, modifications to the buffer will cause the array to be modified and vice versa. The new buffer's capacity will be array.

How do I allocate ByteBuffer?

A new ByteBuffer can be allocated using the method allocate() in the class java. nio. ByteBuffer. This method requires a single parameter i.e. the capacity of the buffer.


1 Answers

I believe you are just doing micro-optimization, which might just not matter (www.codinghorror.com).

Below is a version with a larger buffer and redundant seek / setPosition calls removed.

  • When I enable "native byte ordering" (which is actually unsafe if the machine uses a different 'endian' convention):
mmap: 1.358
bytebuffer: 0.922
regular i/o: 1.387
  • When I comment out the order statement and use the default big-endian ordering:
mmap: 1.336
bytebuffer: 1.62
regular i/o: 1.467
  • Your original code:
mmap: 3.262
bytebuffer: 106.676
regular i/o: 90.903

Here's the code:

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.MappedByteBuffer;

class Testbb2 {
    /** Buffer a whole lot of long values at the same time. */
    static final int BUFFSIZE = 0x800 * 8; // 8192
    static final int DATASIZE = 0x8000 * BUFFSIZE;

    static public long byteArrayToLong(byte [] in, int offset) {
        return ((((((((long)(in[offset + 0] & 0xff) << 8) | (long)(in[offset + 1] & 0xff)) << 8 | (long)(in[offset + 2] & 0xff)) << 8 | (long)(in[offset + 3] & 0xff)) << 8 | (long)(in[offset + 4] & 0xff)) << 8 | (long)(in[offset + 5] & 0xff)) << 8 | (long)(in[offset + 6] & 0xff)) << 8 | (long)(in[offset + 7] & 0xff);
    }

    public static void main(String [] args) throws IOException {
        long start;
        RandomAccessFile fileHandle;
        FileChannel fileChannel;

        // Sanity check - this way the convert-to-long loops don't need extra bookkeeping like BUFFSIZE / 8.
        if ((DATASIZE % BUFFSIZE) > 0 || (DATASIZE % 8) > 0) {
            throw new IllegalStateException("DATASIZE should be a multiple of 8 and BUFFSIZE!");
        }

        int pos;
        int nDone;

        // create file
        File testFile = new File("file.dat");
        fileHandle = new RandomAccessFile("file.dat", "rw");

        if (testFile.exists() && testFile.length() >= DATASIZE) {
            System.out.println("File exists");
        } else {
            testFile.delete();
            System.out.println("Preparing file");
            byte [] buffer = new byte[BUFFSIZE];
            pos = 0;
            nDone = 0;
            while (pos < DATASIZE) {
                fileHandle.write(buffer);
                pos += buffer.length;
            }

            System.out.println("File prepared");
        } 
        fileChannel = fileHandle.getChannel();

        // mmap()
        MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, DATASIZE);
        byte [] buffer1 = new byte[BUFFSIZE];
        mbb.position(0);
        start = System.currentTimeMillis();
        pos = 0;
        while (pos < DATASIZE) {
            mbb.get(buffer1, 0, BUFFSIZE);
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = byteArrayToLong(buffer1, i);
            }
            pos += BUFFSIZE;
        }
        System.out.println("mmap: " + (System.currentTimeMillis() - start) / 1000.0);

        // bytebuffer
        ByteBuffer buffer2 = ByteBuffer.allocateDirect(BUFFSIZE);
//        buffer2.order(ByteOrder.nativeOrder());
        buffer2.order();
        fileChannel.position(0);
        start = System.currentTimeMillis();
        pos = 0;
        nDone = 0;
        while (pos < DATASIZE) {
            buffer2.rewind();
            fileChannel.read(buffer2);
            buffer2.rewind();   // need to rewind it to be able to use it
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = buffer2.getLong();
            }
            pos += BUFFSIZE;
        }
        System.out.println("bytebuffer: " + (System.currentTimeMillis() - start) / 1000.0);

        // regular i/o
        fileHandle.seek(0);
        byte [] buffer3 = new byte[BUFFSIZE];
        start = System.currentTimeMillis();
        pos = 0;
        while (pos < DATASIZE && nDone != -1) {
            nDone = 0;
            while (nDone != -1  && nDone < BUFFSIZE) {
                nDone = fileHandle.read(buffer3, nDone, BUFFSIZE - nDone);
            }
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = byteArrayToLong(buffer3, i);
            }
            pos += nDone;
        }
        System.out.println("regular i/o: " + (System.currentTimeMillis() - start) / 1000.0);
    }
}
like image 51
JBert Avatar answered Sep 28 '22 16:09

JBert