Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String Constant Pool and intern

I have being trying to understand the concept of String constant pool and inter for last few days, After reading a lot of articles I understood some portions of it, but still confused about few things:-

1.String a = "abc" This creates a object in the String Constant Pool but does the following line of code creates the object "xyz" in String Constant Pool? String b = ("xyz").toLowerCase()

2.

String c = "qwe"   
String d = c.substring(1)    
d.intern()   
String e = "we" 

Should the literal "we" be added to the String consant pool during class loading, if so, why does d==e result in true even when the d is not pointing to String Constant pool

like image 549
Arijit Dasgupta Avatar asked Oct 29 '15 14:10

Arijit Dasgupta


People also ask

What is string pool and what is the use of intern () function?

All Strings are stored in the String Pool (or String Intern Pool) that is allocated in the Java heap. String pool is an implementation of the String Interring Concept. String Interning is a method that stores only a copy of each distinct string literal. The distinct values are stored in the String pool.

What is the string constant pool?

The String constant pool is a special memory area. When we declare a String literal, the JVM creates the object in the pool and stores its reference on the stack. Before creating each String object in memory, the JVM performs some steps to decrease the memory overhead.

What is constant pool and non constant pool?

Both variables refer to the same object due to interning. Since strings are immutable, only one object is created and both refer to the same object. A constant pool is also something, which holds all the constants (integer, string, etc.) that are declared in a class. It is specific to each class.

What do you mean by string intern?

In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned.


1 Answers

The string pool is being lazily loaded. If you call intern() yourself before the string literal, then that is the version of the string that will go into the string pool. If you do not call intern() yourself, then the string literal will populate the string pool for us.

The surprising part is that we can influence the string pool ahead of the constant pool; as is demonstrated in the code snippets below.


To understand why the two code snippets have different behaviour, it is important to be clear that

  1. the constant pool is not the same as the string pool. That is, the constant pool is a section of the class file stored on disk and the string pool is a runtime cache populated with strings.

  2. and that referencing a string literal does not reference the constant pool directly it instead as per the Java Language Specification jls-3.10.5; a character literal populates the string pool from the constant pool if and only if there is not already a value within the string pool.

That is to say, that the life cycle of a String object from source file to runtime is as follows:

  1. placed into the constant pool by the compiler at compile time and stored within the generated class file (there is one constant pool per class file)
  2. the constant pools are loaded by the JVM at class load time
  3. the strings created from the constant pool are added to the string pool at runtime as intern is called (if an equivalent string is not already there, if there is a string already there then the one in the string pool will be used) JVM Spec 5.1 - The Run-Time Constant Pool.
  4. intern can happen explicitly by manually calling intern() or implicitly by referencing a string literal such as "abc" jls-3.10.5.

The difference in behaviour between the following two code snippets is caused by calling intern() explicitly before the implicit call to intern via the string literal has occurred.

For clarity, here is a run through of the two behaviours that were discussed in the comments to this answer:

    String c = "qwe";   // string literal qwe goes into runtime cache
    String d = c.substring(1); // runtime string "we" is created
    d.intern();         // intern "we"; it has not been seen 
                        // yet so this version goes into the cache
    String e = "we";    // now we see the string literal, but
                        // a value is already in the cache and so 
                        // the same instance as d is returned 
                        // (see ref below)

    System.out.println( e == d );  // returns true

And here is what happens when we intern after the string literal is used:

    String c = "qwe";   // string literal qwe goes into runtime cache
    String d = c.substring(1); // runtime string "we" is created
    String e = "we";    // now we see the string literal, this time
                        // a value is NOT already in the cache and so 
                        // the string literal creates an object and
                        // places it into the cache
    d.intern();         // has no effect - a value already exists
                        // in the cache, and so it will return e

    System.out.println( e == d );  // returns false
    System.out.println( e == d.intern() );  // returns true
    System.out.println( e == d );  // still returns false

Below are the key part of the JLS, stating that intern is implicitly called for string literals.

Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.

And the JVM spec covers details on the runtime representation of the constant pool loaded from the class file and it interacts with intern.

If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode code points identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String. .

like image 193
Chris K Avatar answered Sep 21 '22 00:09

Chris K