Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is String Literal Pool a collection of references to the String Object, Or a collection of Objects

I am all confused after reading the article on javaranch site by Corey McGlone, The author of The SCJP Tip Line. named Strings, Literally and the SCJP Java 6 Programmer Guide by Kathy Sierra (co-founder of javaranch) and Bert Bates.

I will try to quote what Mr. Corey and Ms Kathy Sierra have quoted about String Literal Pool.

1. According to Mr Corey McGlone :

  • String Literal Pool is a Collection of references that point to the String Objects.

  • String s = "Hello"; (Assume there is No object on the Heap named "Hello"), will create a String object "Hello" on the heap, and will place a reference to this object in the String Literal Pool (Constant Table)

  • String a = new String("Bye"); (Assume there is No object on the Heap named "Bye", new operator will oblige the JVM to create an object on the Heap.

Now the explanation of "new" operator for the creation of a String and its reference is bit confusing in this article, so I am putting the code and explanation from the article itself as it-is below.

public class ImmutableStrings {     public static void main(String[] args)     {         String one = "someString";         String two = new String("someString");          System.out.println(one.equals(two));         System.out.println(one == two);     } } 

In this case, we actually end up with a slightly different behavior because of the keyword "new." In such a case, references to the two String literals are still put into the constant table (the String Literal Pool), but, when you come to the keyword "new," the JVM is obliged to create a new String object at run-time, rather than using the one from the constant table.

Here is the diagram explaining it..

enter image description here

So does it mean, that String Literal Pool too has a reference to this Object ?

Here is the link to the Article by Corey McGlone

http://www.javaranch.com/journal/200409/Journal200409.jsp#a1

2. According to Kathy Sierra and Bert Bates in SCJP book:

  • To make Java more memory efficient, the JVM set aside a special area of memory called the "String constant pool", when the compiler encounters a String Literal, it checks the pool to see if an identical String already exists or not. If not then it creates a new String Literal Object.

  • String s = "abc"; // Creates one String object and one reference variable....

    that's fine, but then I was confused by this statement:

  • String s = new String("abc") // Creates two objects, and one reference variable.

    It says in the book that.... a new String object in normal(non-pool) memory , and "s" will refer to it... whereas an additional literal "abc" will be placed in the pool.

    The above lines in the book collide with the one in the article by Corey McGlone.

    • If String Literal Pool is a collection of references to the String object as mentioned by Corey McGlone, then why wil the literal object "abc" be placed in the pool (as mentioned in the book)?

    • And where does this String Literal Pool reside?

Please clear this doubt, though it won't matter too much while writing a code, but is very important from the aspect of memory management, and thats the reason I want to clear this funda.

like image 764
Kumar Vivek Mitra Avatar asked Jul 28 '12 10:07

Kumar Vivek Mitra


People also ask

What is the string pool?

String pool is a storage space in the Java heap memory where string literals are stored. It is also known as String Constant Pool or String Intern Pool. It is privately maintained by the Java String class. By default, the String pool is empty.

Is string literal an object?

A String literal is a String object, but a String object is not necessarily a String literal. And once assigned to a reference variable, it's all but impossible to tell if a given String object is a literal or not.

How is string literal different from string object?

Definition. String literal in Java is a set of characters that is created by enclosing them inside a pair of double quotes. In contrast, String Object is a Java is a set of characters that is created using the new() operator. Thus, this explains the main difference between string literal and string object.

What is string pool explain with example?

String pool is an implementation of the String Interring Concept. String Interning is a method that stores only a copy of each distinct string literal. The distinct values are stored in the String pool. String pool is an example of the Flyweight Design Pattern.


1 Answers

I think the main point to understand here is the distinction between String Java object and its contents - char[] under private value field. String is basically a wrapper around char[] array, encapsulating it and making it impossible to modify so the String can remain immutable. Also the String class remembers which parts of this array is actually used (see below). This all means that you can have two different String objects (quite lightweight) pointing to the same char[].

I will show you few examples, together with hashCode() of each String and hashCode() of internal char[] value field (I will call it text to distinguish it from string). Finally I'll show javap -c -verbose output, together with constant pool for my test class. Please do not confuse class constant pool with string literal pool. They are not quite the same. See also Understanding javap's output for the Constant Pool.

Prerequisites

For the purpose of testing I created such a utility method that breaks String encapsulation:

private int showInternalCharArrayHashCode(String s) {     final Field value = String.class.getDeclaredField("value");     value.setAccessible(true);     return value.get(s).hashCode(); } 

It will print hashCode() of char[] value, effectively helping us understand whether this particular String points to the same char[] text or not.

Two string literals in a class

Let's start from the simplest example.

Java code

String one = "abc"; String two = "abc"; 

BTW if you simply write "ab" + "c", Java compiler will perform concatenation at compile time and the generated code will be exactly the same. This only works if all strings are known at compile time.

Class constant pool

Each class has its own constant pool - a list of constant values that can be reused if they occur several times in the source code. It includes common strings, numbers, method names, etc.

Here are the contents of the constant pool in our example above.

const #2 = String   #38;    //  abc //... const #38 = Asciz   abc; 

The important thing to note is the distinction between String constant object (#2) and Unicode encoded text "abc" (#38) that the string points to.

Byte code

Here is generated byte code. Note that both one and two references are assigned with the same #2 constant pointing to "abc" string:

ldc #2; //String abc astore_1    //one ldc #2; //String abc astore_2    //two 

Output

For each example I am printing the following values:

System.out.println(showInternalCharArrayHashCode(one)); System.out.println(showInternalCharArrayHashCode(two)); System.out.println(System.identityHashCode(one)); System.out.println(System.identityHashCode(two)); 

No surprise that both pairs are equal:

23583040 23583040 8918249 8918249 

Which means that not only both objects point to the same char[] (the same text underneath) so equals() test will pass. But even more, one and two are the exact same references! So one == two is true as well. Obviously if one and two point to the same object then one.value and two.value must be equal.

Literal and new String()

Java code

Now the example we all waited for - one string literal and one new String using the same literal. How will this work?

String one = "abc"; String two = new String("abc"); 

The fact that "abc" constant is used two times in the source code should give you some hint...

Class constant pool

Same as above.

Byte code

ldc #2; //String abc astore_1    //one  new #3; //class java/lang/String dup ldc #2; //String abc invokespecial   #4; //Method java/lang/String."<init>":(Ljava/lang/String;)V astore_2    //two 

Look carefully! The first object is created the same way as above, no surprise. It just takes a constant reference to already created String (#2) from the constant pool. However the second object is created via normal constructor call. But! The first String is passed as an argument. This can be decompiled to:

String two = new String(one); 

Output

The output is a bit surprising. The second pair, representing references to String object is understandable - we created two String objects - one was created for us in the constant pool and the second one was created manually for two. But why, on earth the first pair suggests that both String objects point to the same char[] value array?!

41771 41771 8388097 16585653 

It becomes clear when you look at how String(String) constructor works (greatly simplified here):

public String(String original) {     this.offset = original.offset;     this.count = original.count;     this.value = original.value; } 

See? When you are creating new String object based on existing one, it reuses char[] value. Strings are immutable, there is no need to copy data structure that is known to be never modified.

I think this is the clue of your problem: even if you have two String objects, they might still point to the same contents. And as you can see the String object itself is quite small.

Runtime modification and intern()

Java code

Let's say you initially used two different strings but after some modifications they are all the same:

String one = "abc"; String two = "?abc".substring(1);  //also two = "abc" 

The Java compiler (at least mine) is not clever enough to perform such operation at compile time, have a look:

Class constant pool

Suddenly we ended up with two constant strings pointing to two different constant texts:

const #2 = String   #44;    //  abc const #3 = String   #45;    //  ?abc const #44 = Asciz   abc; const #45 = Asciz   ?abc; 

Byte code

ldc #2; //String abc astore_1    //one  ldc #3; //String ?abc iconst_1 invokevirtual   #4; //Method String.substring:(I)Ljava/lang/String; astore_2    //two 

The fist string is constructed as usual. The second is created by first loading the constant "?abc" string and then calling substring(1) on it.

Output

No surprise here - we have two different strings, pointing to two different char[] texts in memory:

27379847 7615385 8388097 16585653 

Well, the texts aren't really different, equals() method will still yield true. We have two unnecessary copies of the same text.

Now we should run two exercises. First, try running:

two = two.intern(); 

before printing hash codes. Not only both one and two point to the same text, but they are the same reference!

11108810 11108810 15184449 15184449 

This means both one.equals(two) and one == two tests will pass. Also we saved some memory because "abc" text appears only once in memory (the second copy will be garbage collected).

The second exercise is slightly different, check out this:

String one = "abc"; String two = "abc".substring(1); 

Obviously one and two are two different objects, pointing to two different texts. But how come the output suggests that they both point to the same char[] array?!?

23583040 23583040 11108810 8918249 

I'll leave the answer to you. It'll teach you how substring() works, what are the advantages of such approach and when it can lead to big troubles.

like image 101
Tomasz Nurkiewicz Avatar answered Oct 23 '22 04:10

Tomasz Nurkiewicz