How can data written to a file really be flushed/synced with the block device by Java.
I tried this code with NIO:
FileOutputStream s = new FileOutputStream(filename) Channel c = s.getChannel() while(xyz) c.write(buffer) c.force(true) s.getFD().sync() c.close()
I supposed that c.force(true) togehter with s.getFD().sync() should be sufficient because the doc for force states
Forces any updates to this channel's file to be written to the storage device that contains it. If this channel's file resides on a local storage device then when this method returns it is guaranteed that all changes made to the file since this channel was created, or since this method was last invoked, will have been written to that device. This is useful for ensuring that critical information is not lost in the event of a system crash.
The documentation to sync states:
Force all system buffers to synchronize with the underlying device. This method returns after all modified data and attributes of this FileDescriptor have been written to the relevant device(s). In particular, if this FileDescriptor refers to a physical storage medium, such as a file in a file system, sync will not return until all in-memory modified copies of buffers associated with this FileDesecriptor have been written to the physical medium. sync is meant to be used by code that requires physical storage (such as a file) to be in a known state.
These two calls should be sufficient. Is it? I guess they aren't.
Background: I do a small performance comparison (2 GB, sequential write) using C/Java and the Java version is twice as fast as the C version and probably faster than the hardware (120 MB/s on a single HD). I also tried to execute the command line tool sync with Runtime.getRuntime().exec("sync") but that hasn't changed the behavior.
The C code resulting in 70 MB/s is (using the low level APIs (open,write,close) doesn't change much):
FILE* fp = fopen(filename, "w"); while(xyz) { fwrite(buffer, 1, BLOCK_SIZE, fp); } fflush(fp); fclose(fp); sync();
Without the final call to sync; I got unrealistical values (over 1 GB aka main memory performance).
Why is there such a big difference between C and Java? There are two possiblities: I doesn't sync the data correctly in Java or the C code is suboptimal for some reason.
Update: I have done strace runs with "strace -cfT cmd". Here are the results:
C (Low-Level API): MB/s 67.389782
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 87.21 0.200012 200012 1 fdatasync 11.05 0.025345 1 32772 write 1.74 0.004000 4000 1 sync
C (High-Level API): MB/s 61.796458
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 73.19 0.144009 144009 1 sync 26.81 0.052739 1 65539 write
Java (1.6 SUN JRE, java.io API): MB/s 128.6755466197537
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 80.07 105.387609 3215 32776 write 2.58 3.390060 3201 1059 read 0.62 0.815251 815251 1 fsync
Java (1.6 SUN JRE, java.nio API): MB/s 127.45830221558376
5.52 0.980061 490031 2 fsync 1.60 0.284752 9 32774 write 0.00 0.000000 0 80 close
The time values seem to be system time only and are therefore pretty meaningless.
Update 2: I switched to another server, rebooted, and I use a fresh formatted ext3. Now I get only 4% differences between Java and C. I simply don't know what went wrong. Sometimes things are strange. I should have tried the measurement with another system before writing this question. Sorry.
Update 3: To summarize the answers:
Update 4: Please note the following follow-up question.
Actually, in C you want to just call fsync()
on the one file descriptor, not sync()
(or the "sync" command) which signals the kernel to flush
all buffers to disk system-wide.
If you strace
(getting Linux-specific here) the JVM you should be able to observe an fsync()
or fdatasync()
system call being made on your output file. That would be what I'd expect the getFD()
.sync()
call to do. I assume c.force(true)
simply flags to NIO that fsync()
should be called after each write. It might simply be that the JVM you're using doesn't actually implement the sync()
call?
I'm not sure why you weren't seeing any difference when calling "sync" as a command: but obviously, after the first sync invocation, subsequent ones are usually quite a lot faster. Again, I'd be inclined to break out strace
(truss on Solaris) as a "what's actually happening here?" tool.
It is a good idea to use the synchronized I/O data integrity completion. However your C sample is using the wrong method. You use sync()
, which is used to sync the whole OS.
If you want to write the blocks of that single file to disk, you need to use fsync(2)
or fdatasync(2)
in C. BTW: when you use buffered stdio in C (or a BufferedOutputStream or some Writer in Java) you need to flush both first before you sync.
The fdatasync()
variant is a bit more efficient if the file has not changed name or size since you sync. But it might also not persit all the meta data. If you want to write your own transactional safe database systems, you need to observe some more stuff (like fsyncing the parent directory).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With