I have a module that is responsible for reading, processing, and writing bytes to disk. The bytes come in over UDP and, after the individual datagrams are assembled, the final byte array that gets processed and written to disk is typically between 200 bytes and 500,000 bytes. Occassionally, there will be byte arrays that, after assembly, are over 500,000 bytes, but these are relatively rare.
I'm currently using the FileOutputStream
's write(byte\[\])
method. I'm also experimenting with wrapping the FileOutputStream
in a BufferedOutputStream
, including using the constructor that accepts a buffer size as a parameter.
It appears that using the BufferedOutputStream
is tending toward slightly better performance, but I've only just begun to experiment with different buffer sizes. I only have a limited set of sample data to work with (two data sets from sample runs that I can pipe through my application). Is there a general rule-of-thumb that I might be able to apply to try to calculate the optimal buffer sizes to reduce disk writes and maximize the performance of the disk writing given the information that I know about the data I'm writing?
BufferedOutputStream helps when the writes are smaller than the buffer size e.g. 8 KB. For larger writes it doesn't help nor does it make it much worse. If ALL your writes are larger than the buffer size or you always flush() after every write, I would not use a buffer. However if a good portion of your writes are less that the buffer size and you don't use flush() every time, its worth having.
You may find increasing the buffer size to 32 KB or larger gives you a marginal improvement, or make it worse. YMMV
You might find the code for BufferedOutputStream.write useful
/**
* Writes <code>len</code> bytes from the specified byte array
* starting at offset <code>off</code> to this buffered output stream.
*
* <p> Ordinarily this method stores bytes from the given array into this
* stream's buffer, flushing the buffer to the underlying output stream as
* needed. If the requested length is at least as large as this stream's
* buffer, however, then this method will flush the buffer and write the
* bytes directly to the underlying output stream. Thus redundant
* <code>BufferedOutputStream</code>s will not copy data unnecessarily.
*
* @param b the data.
* @param off the start offset in the data.
* @param len the number of bytes to write.
* @exception IOException if an I/O error occurs.
*/
public synchronized void write(byte b[], int off, int len) throws IOException {
if (len >= buf.length) {
/* If the request length exceeds the size of the output buffer,
flush the output buffer and then write the data directly.
In this way buffered streams will cascade harmlessly. */
flushBuffer();
out.write(b, off, len);
return;
}
if (len > buf.length - count) {
flushBuffer();
}
System.arraycopy(b, off, buf, count, len);
count += len;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With