I understand that using a BufferedReader (wrapping a FileReader) is going to be significantly slower than using a BufferedInputStream (wrapping a FileInputStream), because the raw bytes have to be converted to characters. But I don't understand why it is so much slower! Here are the two code samples that I'm using:
BufferedInputStream inputStream = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] byteBuffer = new byte[bufferSize];
int numberOfBytes;
do {
numberOfBytes = inputStream.read(byteBuffer, 0, bufferSize);
} while (numberOfBytes >= 0);
}
finally {
inputStream.close();
}
and:
BufferedReader reader = new BufferedReader(new FileReader(filename), bufferSize);
try {
char[] charBuffer = new char[bufferSize];
int numberOfChars;
do {
numberOfChars = reader.read(charBuffer, 0, bufferSize);
} while (numberOfChars >= 0);
}
finally {
reader.close();
}
I've tried tests using various buffer sizes, all with a 150 megabyte file. Here are the results (buffer size is in bytes; times are in milliseconds):
Buffer Input
Size Stream Reader
4,096 145 497
8,192 125 465
16,384 95 515
32,768 74 506
65,536 64 531
As can be seen, the fastest time for the BufferedInputStream (64 ms) is seven times faster than the fastest time for the BufferedReader (465 ms). As I stated above, I don't have an issue with a significant difference; but this much difference just seems unreasonable.
My question is: does anyone have a suggestion for how to improve the performance of the BufferedReader, or an alternative mechanism?
EfficiencyBufferedReader is much more efficient than FileReader in terms of performance. FileReader directly reads the data from the character stream that originates from a file.
The main difference between BufferedReader and BufferedInputStream is that BufferedReader reads characters (text), whereas the BufferedInputStream reads raw bytes. The Java BufferedReader class is a subclass of the Java Reader class, so you can use a BufferedReader anywhere a Reader is required.
BufferedReader is a bit faster as compared to scanner because the scanner does the parsing of input data and BufferedReader simply reads a sequence of characters.
BufferedReader reads a couple of characters from the Input Stream and stores them in a buffer. InputStreamReader reads only one character from the input stream and the remaining characters still remain in the streams hence There is no buffer in this case.
The BufferedReader has convert the bytes into chars. This byte by byte parsing and copy to a larger type is expensive relative to a straight copy of blocks of data.
byte[] bytes = new byte[150 * 1024 * 1024];
Arrays.fill(bytes, (byte) '\n');
for (int i = 0; i < 10; i++) {
long start = System.nanoTime();
StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes));
long time = System.nanoTime() - start;
System.out.printf("Time to decode %,d MB was %,d ms%n",
bytes.length / 1024 / 1024, time / 1000000);
}
prints
Time to decode 150 MB was 226 ms
Time to decode 150 MB was 167 ms
NOTE: Having to do this intermixed with system calls can slow down both operations (as system calls can disturb the cache)
in BufferedReader implementation there is a fixed constant defaultExpectedLineLength = 80
, which is used in readLine
method when allocating StringBuffer
. If you have big file with lots of lines longer then 80, this fragment might be something that can be improved
if (s == null)
s = new StringBuffer(defaultExpectedLineLength);
s.append(cb, startChar, i - startChar);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With