In response to my answer to a file-reading question, a commenter stated that FileInputStream.read(byte[])
is "not guaranteed to fill the buffer."
File file = /* ... */
long len = file.length();
byte[] buffer = new byte[(int)len];
FileInputStream in = new FileInputStream(file);
in.read(buffer);
(The code assumes that the file length does not exceed 2GB)
Apart from an IOException
, what could cause the read
method to not retrieve the entire file contents?
EDIT:
The idea of the code (and the goal of the OP of the question I answered) is to read the entire file into a chunk of memory in one swoop, that's why buffer_size = file_size.
The read() method of a FileInputStream returns an int which contains the byte value of the byte read.
InputStream − This is used to read (sequential) data from a source. OutputStream − This is used to write data to a destination.
FileInputStream is Byte Based, it can be used to read bytes. FileReader is Character Based, it can be used to read characters. FileInputStream is used for reading binary files. FileReader is used for reading text files in platform default encoding.
Java FileInputStream class obtains input bytes from a file. It is used for reading byte-oriented data (streams of raw bytes) such as image data, audio, video etc. You can also read character-stream data. But, for reading streams of characters, it is recommended to use FileReader class.
Apart from an IOException, what could cause the read method to not retrieve the entire file contents?
In my own API implementation, and on my home rolled file-system I simply choose to fill half the buffer...... just kidding.
My point is that even if I wasn't kidding, technically speaking it wouldn't be a bug. It is a matter of method contract. This is the contract (documentation) in this case is:
Reads up to
b.length
bytes of data from this input stream into an array of bytes.
i.e., it gives no guarantees for filling the buffer.
Depending on the API implementation, and perhaps on the file-system the read
method may choose not to fill the buffer. It's basically a question of what the contract of the method says.
Bottom line: It probably works, but is not guaranteed to work.
what could cause the read method to not retrieve the entire file contents?
If, for example, the file is fragmented on the filesystem and the low-level implementation knows that it will have to wait for the HD to seek to the next fragment (which is something that takes a LOT of time relative to CPU operations), it would make sense for the read()
call to return with part of the buffer unfilled to give the application the chance to already do something with the data it has recieved.
Now I don't know whether any implementation actually works like that, but the point is that you must not rely on the buffer being filled, because it's not guaranteed by the API contract.
Well, first off you've made yourself a false dichotomy. One perfectly normal circumstance is that the buffer won't be filled because there aren't that many bytes left in the file. That is not an IOException
, but it doesn't mean the whole file's contents have not been read.
The spec says the method will either return -1 indicating end-of-stream or will block until at least one byte is read. Implementers of InputStream
can optimize as they see fit (e.g. a TCP stream might return data as soon as the packet comes in regardless of the caller's choice of buffer size). A FileInputStream
might fill the buffer with one block's worth of data. As the caller, you have no idea except that until the method returns -1
, you need to keep on reading.
In practice, with your example, the only circumstance I would see where the buffer wouldn't be filled (with a standard implementation) is if the file changed size after you allocated the buffer but before you started reading it. Since you haven't locked the file down this is a possibility.
People have talked about read on a FileInputStream
as hypothetically not filling the buffer. In fact it is a reality in some circumstances:
If you open a FileInputStream on a "/dev/tty" or a named pipe, then a read
will only return you the data that is currently available. Other device files may behave the same way. (These files will probably return 0L
as the file size though.)
A FUSE file system can be implemented to not completely fill the read buffer if the file system has been mounted with the direct_io
option, or a file is opened with the corresponding flag.
The above apply to Linux, but there could well be similar cases for other operating systems and/or Java implementations. The bottom line is that the javadocs allow this behavior and you can get into trouble if your application assumes that it won't occur.
There are 3rd party libraries that implement "read fully" behavior; e.g. Apache commons provides FileUtils.readFileToByteArray
or IOUtils.toByteArray
and similar methods. If you want / need that behavior you should use one of those libraries, or implement it yourself.
It's not guaranteed to Fill the buffer.
The file size may be smaller than the buffer, or the remainder of the file may be smaller than the buffer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With