Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where do Java and .NET string literals reside?

A recent question about string literals in .NET caught my eye. I know that string literals are interned so that different strings with the same value refer to the same object. I also know that a string can be interned at runtime:

string now = DateTime.Now.ToString().Intern();  

Obviously a string that is interned at runtime resides on the heap but I had assumed that a literal is placed in the program's data segment (and said so in my answer to said question). However I don't remember seeing this anywhere. I assume this is the case since it's how I would do it and the fact that the ldstr IL instruction is used to get literals and no allocation seems to take place seems to back me up.

To cut a long story short, where do string literals reside? Is it on the heap, the data segment or some-place I haven't thought of?


Edit: If string literals do reside on the heap, when are they allocated?

like image 599
Motti Avatar asked Dec 16 '08 20:12

Motti


People also ask

Where are string literals stored Java?

In Java, strings are stored in the heap area.

Where are string literals stored in memory?

Strings are stored on the heap area in a separate memory location known as String Constant pool. String constant pool: It is a separate block of memory where all the String variables are held. String str1 = "Hello"; directly, then JVM creates a String object with the given value in a String constant pool.

Where are string literals stored on the stack?

The stack will store the value of the int literal and references of String and Demo objects. The value of any object will be stored in the heap, and all the String literals go in the pool inside the heap: The variables created on the stack are deallocated as soon as the thread completes execution.

Does Java have string literals?

A string literal in Java is basically a sequence of characters from the source character set used by Java programmers to populate string objects or to display text to a user. These characters could be anything like letters, numbers or symbols which are enclosed within two quotation marks.


2 Answers

Strings in .NET are reference types, so they are always on the heap (even when they are interned). You can verify this using a debugger such as WinDbg.

If you have the class below

   class SomeType {       public void Foo() {          string s = "hello world";          Console.WriteLine(s);          Console.WriteLine("press enter");          Console.ReadLine();       }    } 

And you call Foo() on an instance, you can use WinDbg to inspect the heap.

The reference will most likely be stored in a register for a small program, so the easiest is to find the reference to the specific string is by doing a !dso. This gives us the address of our string in question:

0:000> !dso OS Thread Id: 0x1660 (0) ESP/REG  Object   Name 002bf0a4 025d4bf8 Microsoft.Win32.SafeHandles.SafeFileHandle 002bf0b4 025d4bf8 Microsoft.Win32.SafeHandles.SafeFileHandle 002bf0e8 025d4e5c System.Byte[] 002bf0ec 025d4c0c System.IO.__ConsoleStream 002bf110 025d4c3c System.IO.StreamReader 002bf114 025d4c3c System.IO.StreamReader 002bf12c 025d5180 System.IO.TextReader+SyncTextReader 002bf130 025d4c3c System.IO.StreamReader 002bf140 025d5180 System.IO.TextReader+SyncTextReader 002bf14c 025d5180 System.IO.TextReader+SyncTextReader 002bf15c 025d2d04 System.String    hello world             // THIS IS THE ONE 002bf224 025d2ccc System.Object[]    (System.String[]) 002bf3d0 025d2ccc System.Object[]    (System.String[]) 002bf3f8 025d2ccc System.Object[]    (System.String[]) 

Now use !gcgen to find out which generation the instance is in:

0:000> !gcgen 025d2d04  Gen 0 

It's in generation zero - i.e. it has just be allocated. Who's rooting it?

0:000> !gcroot 025d2d04  Note: Roots found on stacks may be false positives. Run "!help gcroot" for more info. Scan Thread 0 OSTHread 1660 ESP:2bf15c:Root:025d2d04(System.String) Scan Thread 2 OSTHread 16b4 DOMAIN(000E4840):HANDLE(Pinned):6513f4:Root:035d2020(System.Object[])-> 025d2d04(System.String) 

The ESP is the stack for our Foo() method, but notice that we have a object[] as well. That's the intern table. Let's take a look.

0:000> !dumparray 035d2020 Name: System.Object[] MethodTable: 006984c4 EEClass: 00698444 Size: 528(0x210) bytes Array: Rank 1, Number of elements 128, Type CLASS Element Methodtable: 00696d3c [0] 025d1360 [1] 025d137c [2] 025d139c [3] 025d13b0 [4] 025d13d0 [5] 025d1400 [6] 025d1424 ... [36] 025d2d04  // THIS IS OUR STRING ... [126] null [127] null 

I reduced the output somewhat, but you get the idea.

In conclusion: strings are on the heap - even when they are interned. The interned table holds a reference to the instance on the heap. I.e. interned strings are not collected during GC because the interned table roots them.

like image 129
Brian Rasmussen Avatar answered Oct 07 '22 03:10

Brian Rasmussen


In Java (from the Java Glossary):

In Sun’s JVM, the interned Strings (which includes String literals) are stored in a special pool of RAM called the perm gen, where the JVM also loads classes and stores natively compiled code. However, the intered Strings behave no differently than had they been stored in the ordinary object heap.

like image 24
Michael Myers Avatar answered Oct 07 '22 02:10

Michael Myers