Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to write an array of integers to a file in Java?

As the title says, I'm looking for the fastest possible way to write integer arrays to files. The arrays will vary in size, and will realistically contain anywhere between 2500 and 25 000 000 ints.

Here's the code I'm presently using:

DataOutputStream writer = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(filename)));

for (int d : data)
  writer.writeInt(d);

Given that DataOutputStream has a method for writing arrays of bytes, I've tried converting the int array to a byte array like this:

private static byte[] integersToBytes(int[] values) throws IOException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    DataOutputStream dos = new DataOutputStream(baos);
    for (int i = 0; i < values.length; ++i) {
        dos.writeInt(values[i]);
    }

    return baos.toByteArray();
}

and like this:

private static byte[] integersToBytes2(int[] src) {
    int srcLength = src.length;
    byte[] dst = new byte[srcLength << 2];

    for (int i = 0; i < srcLength; i++) {
        int x = src[i];
        int j = i << 2;
        dst[j++] = (byte) ((x >>> 0) & 0xff);
        dst[j++] = (byte) ((x >>> 8) & 0xff);
        dst[j++] = (byte) ((x >>> 16) & 0xff);
        dst[j++] = (byte) ((x >>> 24) & 0xff);
    }
    return dst;
}

Both seem to give a minor speed increase, about 5%. I've not tested them rigorously enough to confirm that.

Are there any techniques that will speed up this file write operation, or relevant guides to best practice for Java IO write performance?

like image 604
Ollie Glass Avatar asked Dec 05 '10 12:12

Ollie Glass


People also ask

How do you write an array to a file?

We can create an array and use print_r() function to return the array and use the fwrite() function to write the array to the file. The print_r() function takes the array to be printed and the boolean value as the parameters. Use the fopen() function to create a file file.

How do you create an array of files in Java?

File> files = new ArrayList<>(); and add the files like: files. add(file);

How do you declare an array of 10 integers in Java?

Array Initialization in Javaint[] intArray = new int[10]; This allocates the memory for an array of size 10 . This size is immutable. Java populates our array with default values depending on the element type - 0 for integers, false for booleans, null for objects, etc.


1 Answers

Benchmarks should be repeated every once in a while, shouldn't they? :) After fixing some bugs and adding my own writing variant, here are the results I get when running the benchmark on an ASUS ZenBook UX305 running Windows 10 (times given in seconds):

Running tests... 0 1 2
Buffered DataOutputStream           8,14      8,46      8,30
FileChannel alt2                    1,55      1,18      1,12
ObjectOutputStream                  9,60     10,41     11,68
FileChannel                         1,49      1,20      1,21
FileChannel alt                     5,49      4,58      4,66

And here are the results running on the same computer but with Arch Linux and the order of the write methods switched:

Running tests... 0 1 2
Buffered DataOutputStream          31,16      6,29      7,26
FileChannel                         1,07      0,83      0,82
FileChannel alt2                    1,25      1,71      1,42
ObjectOutputStream                  3,47      5,39      4,40
FileChannel alt                     2,70      3,27      3,46

Each test wrote an 800mb file. The unbuffered DataOutputStream took way to long so I excluded it from the benchmark.

As seen, writing using a file channel still beats the crap out of all other methods, but it matters a lot whether the byte buffer is memory-mapped or not. Without memory-mapping the file channel write took 3-5 seconds:

var bb = ByteBuffer.allocate(4 * ints.length);
for (int i : ints)
    bb.putInt(i);
bb.flip();
try (var fc = new FileOutputStream("fcalt.out").getChannel()) {
    fc.write(bb);
}

With memory-mapping, the time was reduced to between 0.8 to 1.5 seconds:

try (var fc = new RandomAccessFile("fcalt2.out", "rw").getChannel()) {
    var bb = fc.map(READ_WRITE, 0, 4 * ints.length);
    bb.asIntBuffer().put(ints);
}

But note that the results are order-dependent. Especially so on Linux. It appears that the memory-mapped methods doesn't write the data in full but rather offloads the job request to the OS and returns before it is completed. Whether that behaviour is desirable or not depends on the situation.

Memory-mapping can also lead to OutOfMemory problems so it is not always the right tool to use. Prevent OutOfMemory when using java.nio.MappedByteBuffer.

Here is my version of the benchmark code: https://gist.github.com/bjourne/53b7eabc6edea27ffb042e7816b7830b

like image 190
Björn Lindqvist Avatar answered Oct 19 '22 10:10

Björn Lindqvist