Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why char[] performs better than String ?- Java

In reference to the link: File IO Tuning, last section titled "Further Tuning" where the author suggests using char[] to avoid generating String objects for n lines in the file, I need to understand how does

char[] arr = new char{'a','u','t','h', 'o', 'r'}

differ with

String s = "author"

in terms of memory consumption or any other performance factor? Isn't String object internally stored as a character array? I feel silly since I never thought of this before. :-)

like image 242
name_masked Avatar asked Nov 18 '11 23:11

name_masked


2 Answers

In Oracle's JDK a String has four instance-level fields:

  • A character array
  • An integral offset
  • An integral character count
  • An integral hash value

That means that each String introduces an extra object reference (the String itself), and three integers in addition to the character array itself. (The offset and character count are there to allow sharing of the character array among String instances produced through the String#substring() methods, a design choice that some other Java library implementers have eschewed.) Beyond the extra storage cost, there's also one more level of access indirection, not to mention the bounds checking with which the String guards its character array.

If you can get away with allocating and consuming just the basic character array, there's space to be saved there. It's certainly not idiomatic to do so in Java though; judicious comments would be warranted to justify the choice, preferably with mention of evidence from having profiled the difference.

like image 69
seh Avatar answered Oct 18 '22 07:10

seh


In the example you've referred to, it's because there's only a single character array being allocated for the whole loop. It's repeatedly reading into that same array, and processing it in place.

Compare that with using readLine which needs to create a new String instance on each iteration. Each String instance will contain a few int fields and a reference to a char[] containing the actual data - so it would need two new instances per iteration.

I'd usually expect the differences to be insignificant (with a decent GC throwing away unused "young" objects very efficiently) compared with the IO involved in reading the data - assuming it's from disk - but I believe that's the point the author was trying to make.

like image 30
Jon Skeet Avatar answered Oct 18 '22 05:10

Jon Skeet