Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java performance of byte[] vs. char[] for file stream

Tags:

java

io

I'm writing a program that reads a file (uses custom buffer, 8KB), then finds a keyword in that buffer. Since Java provides two type of streams: character & byte, I've implemented this using both byte[] and char[] for buffering.

I just wonder, which would be faster and better for performance, since a char is 2 byte and when using Reader to read up char[], the Reader will perform converting back from byte to char, which I think could make it slower than using only byte[].

like image 683
Genzer Avatar asked Aug 15 '11 03:08

Genzer


1 Answers

Using a byte array will be faster:

  • You don't have the bytes to characters decoding step, which is at least a copy loop, and possibly more depending on the Charset used to do the decoding.

  • The byte array will take less space, and hence save CPU cycles in GC / initialization.

However:

  • Unless you are searching huge files, the difference is unlikely to be significant.

  • The byte array approach could FAIL if the input file is not encoded in an 8 bit character set. And even if it works (as it does for UTF-8 & UTF-16) there are potential issues with matching characters that span buffer boundaries.

(The reason that byte-wise treatment works for UTF-8 and UTF-16 is that the encoding makes it easy to distinguish between the first unit (byte or short) and subsequent units of an encoded character.)

like image 111
Stephen C Avatar answered Oct 22 '22 05:10

Stephen C