Using the following code as a benchmark, the system can write 10,000 rows to disk in a fraction of a second:
void withSync() {
int f = open( "/tmp/t8" , O_RDWR | O_CREAT );
lseek (f, 0, SEEK_SET );
int records = 10*1000;
clock_t ustart = clock();
for(int i = 0; i < records; i++) {
write(f, "012345678901234567890123456789" , 30);
fsync(f);
}
clock_t uend = clock();
close (f);
printf(" sync() seconds:%lf writes per second:%lf\n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC));
}
In the above code, 10,000 records can be written and flushed out to disk in a fraction of a second, output below:
sync() seconds:0.006268 writes per second:0.000002
In the Java version, it takes over 4 seconds to write 10,000 records. Is this just a limitation of Java, or am I missing something?
public void testFileChannel() throws IOException {
RandomAccessFile raf = new RandomAccessFile(new File("/tmp/t5"),"rw");
FileChannel c = raf.getChannel();
c.force(true);
ByteBuffer b = ByteBuffer.allocateDirect(64*1024);
long s = System.currentTimeMillis();
for(int i=0;i<10000;i++){
b.clear();
b.put("012345678901234567890123456789".getBytes());
b.flip();
c.write(b);
c.force(false);
}
long e=System.currentTimeMillis();
raf.close();
System.out.println("With flush "+(e-s));
}
Returns this:
With flush 4263
Please help me understand what is the correct/fastest way to write records to disk in Java.
Note: I am using the RandomAccessFile
class in combination with a ByteBuffer
as ultimately we need random read/write access on this file.
Actually, I am surprised that test is not slower. The behavior of force is OS dependent but broadly it forces the data to disk. If you have an SSD you might achieve 40K writes per second, but with an HDD you won't. In the C example its clearly isn't committing the data to disk as even the fastest SSD cannot perform more than 235K IOPS (That the manufacturers guarantee it won't go faster than that :D )
If you need the data committed to disk every time, you can expect it to be slow and entirely dependent on the speed of your hardware. If you just need the data flushed to the OS and if the program crashes but the OS does not, you will not loose any data, you can write data without force. A faster option is to use memory mapped files. This will give you random access without a system call for each record.
I have a library Java Chronicle which can read/write 5-20 millions records per second with a latency of 80 ns in text or binary formats with random access and can be shared between processes. This only works this fast because it is not committing the data to disk on every record, but you can test that if the JVM crashes at any point, no data written to the chronicle is lost.
This code is more similar to what you wrote in C. Takes only 5 msec on my machine. If you really need to flush after every write, it takes about 60 msec. Your original code took about 11 seconds on this machine. BTW, closing the output stream also flushes.
public static void testFileOutputStream() throws IOException {
OutputStream os = new BufferedOutputStream( new FileOutputStream( "/tmp/fos" ) );
byte[] bytes = "012345678901234567890123456789".getBytes();
long s = System.nanoTime();
for ( int i = 0; i < 10000; i++ ) {
os.write( bytes );
}
long e = System.nanoTime();
os.close();
System.out.println( "outputstream " + ( e - s ) / 1e6 );
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With