Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String literals, interning and reflection

I'm trying to find a third solution to this question.

I can't understand why this doesn't print false.

public class MyClass {

    public MyClass() {
        try {
            Field f = String.class.getDeclaredField("value");
            f.setAccessible(true);
            f.set("true", f.get("false"));
        } catch (Exception e) {
        }
    }

    public static void main(String[] args) {
        MyClass m = new MyClass();
        System.out.println(m.equals(m));
    }
}

Surely, because of string interning, the "true" instance being modified is exactly the same one used in the print method of PrintStream?

public void print(boolean b) {
    write(b ? "true" : "false");
}

What am I missing?

Edit

An interesting point by @yshavit is that if you add the line

System.out.println(true);

before the try, the output is

true
false
like image 477
Paul Boddington Avatar asked Apr 01 '16 22:04

Paul Boddington


People also ask

Are string literals interned?

All literal strings and string-valued constant expressions are interned.

What are string literals explain?

A "string literal" is a sequence of characters from the source character set enclosed in double quotation marks (" "). String literals are used to represent a sequence of characters which, taken together, form a null-terminated string. You must always prefix wide-string literals with the letter L.

What are string literals with example?

A string literal is a sequence of zero or more characters enclosed within single quotation marks. The following are examples of string literals: 'Hello, world!' 'He said, "Take it or leave it."'

What is difference between literal and string literal?

Definition. String literal in Java is a set of characters that is created by enclosing them inside a pair of double quotes. In contrast, String Object is a Java is a set of characters that is created using the new() operator. Thus, this explains the main difference between string literal and string object.

What is a string literal?

A string literal represents a sequence of characters that together form a null-terminated string. The characters must be enclosed between double quotation marks. There are the following kinds of string literals:

What is string interning and how does it work?

By applying String.intern () on a couple of strings will ensure that all strings having the same contents share the same memory. For example, if a name ‘Amy’ appears 100 times, by interning you ensure only one ‘Amy’ is actually allocated memory. Attention reader! Don’t stop learning now.

Why String object takes more time to execute than string literal?

This is string object. In this method JVM is forced to create a new string reference, even if “GeeksForGeeks” is in the reference pool. Therefore, if we compare performance of string literal and string object, string object will always take more time to execute than string literal because it will construct a new string every time it is executed.

Where are the operators for string literals declared in C++?

These operators are declared in the namespace std::literals::string_literals, where both literals and string_literals are inline namespaces. Access to these operators can be gained with using namespace std::literals, using namespace std::string_literals, and using namespace std::literals::string_literals.


1 Answers

This is arguably a HotSpot JVM bug.

The problem is in the string literal interning mechanism.

  • java.lang.String instances for the string literals are created lazily during constant pool resolution.
  • Initially a string literal is represented in the constant pool by CONSTANT_String_info structure that points to CONSTANT_Utf8_info.
  • Each class has its own constant pool. That is, MyClass and PrintStream have their own pair of CONSTANT_String_info / CONSTANT_Utf8_info cpool entries for the literal 'true'.
  • When CONSTANT_String_info is accessed for the first time, JVM initiates the process of resolution. String interning is the part of this process.
  • To find a match for a literal being interned, JVM compares the contents of CONSTANT_Utf8_info with the contents of string instances in the StringTable.
  • ^^^ And here is the problem. Raw UTF data from cpool is compared to Java char[] array contents that can be spoofed by a user via Reflection.

So, what's happening in your test?

  1. f.set("true", f.get("false")) initiates the resolution of the literal 'true' in MyClass.
  2. JVM discovers no instances in StringTable matching the sequence 'true', and creates a new java.lang.String, which is stored in StringTable.
  3. value of that String from StringTable is replaced via Reflection.
  4. System.out.println(true) initiates the resolution of the literal 'true' in PrintStream class.
  5. JVM compares UTF sequence 'true' with Strings from StringTable, but finds no match, since that String already has 'false' value. Another String for 'true' is created and placed in StringTable.

Why do I think this is a bug?

JLS §3.10.5 and JVMS §5.1 require that string literals containing the same sequence of characters must point to the same instance of java.lang.String.

However, in the following code the resolution of two string literals with the same sequence of characters result in different instances.

public class Test {

    static class Inner {
        static String trueLiteral = "true";
    }

    public static void main(String[] args) throws Exception {
        Field f = String.class.getDeclaredField("value");
        f.setAccessible(true);
        f.set("true", f.get("false"));

        if ("true" == Inner.trueLiteral) {
            System.out.println("OK");
        } else {
            System.out.println("BUG!");
        }
    }
}

A possible fix for JVM is to store a pointer to original UTF sequence in StringTable along with java.lang.String object, so that interning process will not compare cpool data (inaccessible by user) with value arrays (accessible via Reflection).

like image 61
apangin Avatar answered Oct 20 '22 00:10

apangin