Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to understand Java String implementation

I'm looking at the openjdk implementation of String and the private, per instance members look like:

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence
{
    /** The value is used for character storage. */
    private final char value[];

    /** The offset is the first index of the storage that is used. */
    private final int offset;

    /** The count is the number of characters in the String. */
    private final int count;

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    [...]
}

But I know that Java uses reference and pools for Strings, to avoid duplication. I was naively expecting a pimpl idiom, where String would in fact be just a ref to an impl. I'm not seeing that so far. Can someone explain how Java will know to use references if I put a String x; member in one of my classes?

Addendum: this is probably wrong, but if I'm in 32 bits mode, should I count: 4 bytes for the reference "value[]", 4 bytes for offset, 4 for count and 4 for hash for everything instance of class String? That would mean that writing "String x;" in one of my class automatically adds at least 32 bytes to the "weight" of my class (I'm probably wrong here).

like image 936
Frank Avatar asked Aug 17 '12 16:08

Frank


2 Answers

The offset/count fields are somewhat orthogonal to the pooling/intern() issues. Offset and count come when you have something like:

String substring = myString.substring(5);

One way to implement this method would be something like:

  • allocate a new char[] with myString.length() - 5 elements
  • copy all of the elements from index index 5 to myString.length() from myString to the new char[]
  • substring is constructed with this new char[]
    • substring.charAt(i) goes directly to chars[i]
    • substring.length() goes directly to chars.length

As you san see, this approach is O(N) -- where N is the new string's length -- and requires two allocations: the new String, and the new char[]. So instead, substring works by resusing the original char[] but with an offset:

  • substring.offset = myString.offset + newOffset
  • substring.count = myString.count - newOffset
  • use myString.chars as the chars array for substring
    • substring.charAt(i) goes to chars[i+substring.offset]
    • substring.length() goes to substring.count

Note that we didn't need to create a new char[], and more importantly, we didn't need to copy the chars from the old char[] to the new one (since there is no new one). So this operation is just O(1) and requires only one allocation, that of the new String.

like image 71
yshavit Avatar answered Sep 28 '22 03:09

yshavit


Java always uses references to any object. There's no way to make it not use references. As for string pooling, that is achieved by the compiler for string literals and at runtime by calling String.intern. It is natural that most of the implementation of String is oblivious to whether it is dealing with an instance referred to by the constant pool or not.

like image 22
Marko Topolnik Avatar answered Sep 28 '22 02:09

Marko Topolnik