If I read bytes from a file into a byte[] I see that FileInputStream performance worse when the array is around 1 MB compared to 128 KB. On the 2 workstations I have tested it is almost twice as fast with 128 KB. Why is that?
import java.io.*;
public class ReadFileInChuncks
{
public static void main(String[] args) throws IOException
{
byte[] buffer1 = new byte[1024*128];
byte[] buffer2 = new byte[1024*1024];
String path = "some 1 gb big file";
readFileInChuncks(path, buffer1, false);
readFileInChuncks(path, buffer1, true);
readFileInChuncks(path, buffer2, true);
readFileInChuncks(path, buffer1, true);
readFileInChuncks(path, buffer2, true);
}
public static void readFileInChuncks(String path, byte[] buffer, boolean report) throws IOException
{
long t = System.currentTimeMillis();
InputStream is = new FileInputStream(path);
while ((readToArray(is, buffer)) != 0) {}
if (report)
System.out.println((System.currentTimeMillis()-t) + " ms");
}
public static int readToArray(InputStream is, byte[] buffer) throws IOException
{
int index = 0;
while (index != buffer.length)
{
int read = is.read(buffer, index, buffer.length - index);
if (read == -1)
break;
index += read;
}
return index;
}
}
outputs
422 ms
717 ms
422 ms
718 ms
Notice this is a redefinition of an already posted question. The other was polluted with unrelated discussions. I will mark the other for deletion.
Edit: Duplicate, really? I sure could make some better code to proof my point, but this does not answer my question
Edit2: I ran the test with every buffer between 5 KB and 1000 KB on
Win7 / JRE 1.8.0_25 and the bad performance starts at precis 508 KB and all subsequent. Sorry for the bad diagram legions, x is buffer size, y is milliseconds
The read() method of a FileInputStream returns an int which contains the byte value of the byte read. If the read() method returns -1, there is no more data to read in the stream, and it can be closed. That is, -1 as int value, not -1 as byte value.
With a BufferedInputStream , the method delegates to an overloaded read() method that reads 8192 amount of bytes and buffers them until they are needed. It still returns only the single byte (but keeps the others in reserve). This way the BufferedInputStream makes less native calls to the OS to read from the file.
FileInputStream is Byte Based, it can be used to read bytes. FileReader is Character Based, it can be used to read characters. FileInputStream is used for reading binary files. FileReader is used for reading text files in platform default encoding.
InputStream − This is used to read (sequential) data from a source. OutputStream − This is used to write data to a destination.
TL;DR The performance drop is caused by memory allocation, not by file reading issues.
A typical benchmarking problem: you benchmark one thing, but actually measure another.
First of all, when I rewrote the sample code using RandomAccessFile
, FileChannel
and ByteBuffer.allocateDirect
, the threshold has disappeared. File reading performance became roughly the same for 128K and 1M buffer.
Unlike direct ByteBuffer I/O FileInputStream.read
cannot load data directly into Java byte array. It needs to get data into some native buffer first, and then copy it to Java using JNI SetByteArrayRegion
function.
So we have to look at the native implementation of FileInputStream.read
. It comes down to the following piece of code in io_util.c:
if (len == 0) {
return 0;
} else if (len > BUF_SIZE) {
buf = malloc(len);
if (buf == NULL) {
JNU_ThrowOutOfMemoryError(env, NULL);
return 0;
}
} else {
buf = stackBuf;
}
Here BUF_SIZE == 8192. If the buffer is larger than this reserved stack area, a temporary buffer is allocated by malloc
. On Windows malloc
is usually implemented via HeapAlloc WINAPI call.
Next, I measured the performance of HeapAlloc
+ HeapFree
calls alone without file I/O. The results were interesting:
128K: 5 μs
256K: 10 μs
384K: 15 μs
512K: 20 μs
640K: 25 μs
768K: 29 μs
896K: 33 μs
1024K: 316 μs <-- almost 10x leap
1152K: 356 μs
1280K: 399 μs
1408K: 436 μs
1536K: 474 μs
1664K: 511 μs
1792K: 553 μs
1920K: 592 μs
2048K: 628 μs
As you can see, the performance of OS memory allocation drastically changes at 1MB boundary. This can be explained by different allocation algorithms used for small chunks and for large chunks.
UPDATE
The documentation for HeapCreate confirms the idea about specific allocation strategy for blocks larger than 1MB (see dwMaximumSize description).
Also, the largest memory block that can be allocated from the heap is slightly less than 512 KB for a 32-bit process and slightly less than 1,024 KB for a 64-bit process.
...
Requests to allocate memory blocks larger than the limit for a fixed-size heap do not automatically fail; instead, the system calls the VirtualAlloc function to obtain the memory that is needed for large blocks.
Optimal buffer size depands on file system block size, CPU cache size and cache latency. Most os'es use block size 4096 or 8192 so it is recommended to use buffer with this size or multiplicity of this value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With