Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is RandomAccessFile writeLong implemented with multiple write calls?

While profiling an application I noticed that RandomAccessFile.writeLong was taking a lot of time.

I checked the code for this method, and it involves eight calls of the native method write. I wrote an alternative implementation for writeLong using a byte[]. Something like this:

RandomAccessFile randomAccessFile = new RandomAccessFile("out.dat", "rwd");
...
byte[] aux = new byte[8];
aux[0] = (byte) ((l >>> 56) & 0xFF);
aux[1] = (byte) ((l >>> 48) & 0xFF);
aux[2] = (byte) ((l >>> 40) & 0xFF);
aux[3] = (byte) ((l >>> 32) & 0xFF);
aux[4] = (byte) ((l >>> 24) & 0xFF);
aux[5] = (byte) ((l >>> 16) & 0xFF);
aux[6] = (byte) ((l >>> 8) & 0xFF);
aux[7] = (byte) ((l >>> 0) & 0xFF);
randomAccessFile.write(aux);

I made a small benchmark and got these results:

Using writeLong():
Average time for invocation: 91 ms

Using write(byte[]):
Average time for invocation: 11 ms

Test run on a Linux machine with a Intel(R) CPU T2300 @ 1.66GHz

Since native calls have some performance penalty, why is writeLong implemented that way? I know the question should be made to the Sun guys, but I hope someone in here has some hints.

Thank you.

like image 976
jassuncao Avatar asked Apr 21 '11 09:04

jassuncao


2 Answers

It appears that the RandomAccessFile.writeLong() doesn't minimise the number of calls to the OS. The cost increases dramatically by using "rwd" instead of "rw" which should be enough to indicate its not the calls themselves which cost the time. (its the fact the OS is try to commit every write to disk and the disk only spins so fast)

{
    RandomAccessFile raf = new RandomAccessFile("test.dat", "rwd");
    int longCount = 10000;
    long start = System.nanoTime();
    for (long l = 0; l < longCount; l++)
        raf.writeLong(l);
    long time = System.nanoTime() - start;
    System.out.printf("writeLong() took %,d us on average%n", time / longCount / 1000);
    raf.close();
}
{
    RandomAccessFile raf = new RandomAccessFile("test2.dat", "rwd");
    int longCount = 10000;
    long start = System.nanoTime();
    byte[] aux = new byte[8];
    for (long l = 0; l < longCount; l++) {
        aux[0] = (byte) (l >>> 56);
        aux[1] = (byte) (l >>> 48);
        aux[2] = (byte) (l >>> 40);
        aux[3] = (byte) (l >>> 32);
        aux[4] = (byte) (l >>> 24);
        aux[5] = (byte) (l >>> 16);
        aux[6] = (byte) (l >>> 8);
        aux[7] = (byte) l;
        raf.write(aux);
    }
    long time = System.nanoTime() - start;
    System.out.printf("write byte[8] took %,d us on average%n", time / longCount / 1000);
    raf.close();
}

prints

writeLong() took 2,321 us on average
write byte[8] took 576 us on average

It would appear to me that you have no disk write caching on. Without disk caching, I would expect each commited write to take about 11 ms for a 5400 RPM disk ie 60000 ms/5400 => 11 ms.

like image 85
Peter Lawrey Avatar answered Nov 07 '22 03:11

Peter Lawrey


I would vote for laziness, or (being more charitable) not thinking about the consequences.

A native implementation of writeLong() would potentially require versions for every architecture, to deal with byte ordering (JNI will convert to platform byte order). By keeping the translation in the "cross-platform" layer, the developers simplified the job of porting.

As to why they didn't convert to an array while on the Java side, I suspect that was due to fear of garbage collection. I would guess that RandomAccessFile has changed minimally since 1.1, and it wasn't until 1.3 that garbage collection started to make small object allocations "free".

But, there's an alternative to RandomAccessFile: take a look at MappedByteBuffer


Edit: I have a machine with JDK 1.2.2, and this method has not changed since then.

like image 31
Anon Avatar answered Nov 07 '22 03:11

Anon