Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Seeking out the optimum size for BufferedInputStream in Java

I was profiling my code that was loading a binary file. The load time was something around 15 seconds.

The majority of my load time was coming from the methods that were loading binary data.

I had the following code to create my DataInputStream:

is = new DataInputStream(
     new GZIPInputStream(
     new FileInputStream("file.bin")));

And I changed it to this:

is = new DataInputStream(
     new BufferedInputStream(
     new GZIPInputStream(
     new FileInputStream("file.bin"))));

So after I did this small modification the loading code went from 15 seconds to 4.

But then I found that BufferedInputStream has two constructors. The other constructor lets you explicitly define the buffer size.

I've got two questions:

  1. What size is chosen in BufferedInputStream and is it ideal? If not, how can I find the optimum size for the buffer? Should I write a quick bit of code that does a binary search?
  2. Is this the best way I can use BufferedInputStream? I originally had it within the GZIPInputStream but there was negligable benefit. I'm assuming what the code is doing now is every time that the file buffer needs to be filled, the GZIP input stream goes through and decodes x bytes (where x is the size of the buffer). Would it be worth just omitting the GZIPInputStream entirely? It's definitely not needed, but my file size is decreased dramatically when using it.
like image 826
Brad Avatar asked Dec 14 '10 10:12

Brad


People also ask

What is the optimal size of buffer in BufferedInputStream?

It is best to use buffer sizes that are multiples of 1024 bytes.

What is BufferedInputStream in Java?

A BufferedInputStream adds functionality to another input stream-namely, the ability to buffer the input and to support the mark and reset methods. When the BufferedInputStream is created, an internal buffer array is created.

What is the optimal buffer size?

A good buffer size for recording is 128 samples, but you can also get away with raising the buffer size up to 256 samples without being able to detect much latency in the signal. You can also decrease the buffer size below 128, but then some plugins and effects may not run in real time.

What is default buffer size in Java?

Creates a new buffered output stream to write data to the specified underlying output stream. stream with a default 512-byte buffer size.


2 Answers

Both the GZIPInputStream and the BufferedInputStream use an internal buffer. That is why using a BufferedInputStream inside the GZIPInputStream doesn't provide any benefit. The problem with the GZIPInputStream is that it doesn't buffer the output that it generates, thus your current version is much faster.

The default buffersize for the BufferedInputStream is 8kb, so you can try and increase or decrease that to see if it helps. I doubt that the exact number matters much, so you can simply multiply or divide by two.

If the file is small, you can also try to buffer it completely. This should give you the best performance in theory. You could also try to increase the buffer size of the GZIPInputStream (by default 512 bytes), since this might speed up reading from disk.

like image 194
Marc Avatar answered Sep 21 '22 22:09

Marc


  1. Don't bother with a coded binary search. Just try some values by hand and compare the timings (you can do a manual binary search if you like). You'll most likely find that a very wide range of buffer sizes will give you close-to-optimal performance, so pick the smallest that does the trick.

  2. What you have is the correct order:

    is = new DataInputStream(
         new BufferedInputStream(
         new GZIPInputStream(
         new FileInputStream("file.bin"))));
    

    There is little point in putting a BufferedInputStream inside the GZIPInputStream since the latter already buffers its input (but not the output.)

    Removing GZIPInputStream might be a win, but will most likely be detrimental to performance if the data has to be read from disk and is not resident in the filesystem cache. The reason is that reading from disk is very slow and decompressing gzip is very fast. Therefore it is generally cheaper to read less data from disk and decompress it in memory than it is to read more data from disk.

like image 33
NPE Avatar answered Sep 24 '22 22:09

NPE