Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intern string literals misunderstanding?

I dont understand :

MSDN says

http://msdn.microsoft.com/en-us/library/system.string.intern.aspx

Consequently, an instance of a literal string with a particular value only exists once in the system.

For example, if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal string from the intern pool and assigns it to each variable.

Does this behavior is the Default (without intern ) ? or by using Intern method ?

  • If its default , so why will I want to use intern? (the instance will be once already...) ?

  • If its NOT default : if I write 1000 times this row :

    Console.WriteLine("lalala");

1 ) will I get 1000 occurrences of "lalala" in memory ? ( without using intern ...)

2) will "lalala" will eventually Gc'ed ?

3) Does "lalala" is already interned ? and if it does , why will i need to "get" it from the pool , and not just write "lalala" again ?

Im a bit confuse.

like image 929
Royi Namir Avatar asked Jan 01 '12 07:01

Royi Namir


2 Answers

String literals get interned automatically (so, if your code contains "lalala" 1000 times, only one instance will exist).

Such strings will not get GC'd and any time they are referenced the reference will be the interned one.


string.Intern is there for strings that are not literals - say from user input or read from a file or database and that you know will be repeated very often and as such are worth interning for the lifetime of the process.

like image 84
Oded Avatar answered Oct 14 '22 02:10

Oded


Interning is something that happens behind the scenes, so you as a programmer never have to worry about it. You generally do not have to put anything to the pool, or get anything from the pool. Like garbage collection: you never have to invoke it, or worry that it may happen, or worry that it may not happen. (Well, in 99.999% of the cases. And the remaining 0.001 percent is when you are doing very weird stuff.)

The compiler takes care of interning all string literals that are contained within your source file, so "lalala" will be interned without you having to do anything, or having any control over the matter. And whenever you refer to "lalala" in your program, the compiler makes sure to fetch it from the intern pool, again without you having to do anything, nor having any control over the matter.

The intern pool contains a more-or-less fixed number of strings, generally of a very small size, (only a fraction of the total size of your .exe,) so it does not matter that they never get garbage-collected.


EDIT

The purpose of interning strings is to greatly improve the execution time of certain string operations like Equals(). The Equals() method of String first checks whether the strings are equal by reference, which is extremely fast; if the references are equal, then it returns true immediately; if the references are not equal, and the strings are both interned, then it returns false immediately, because they cannot possibly be equal, since all strings in the intern pool are different from each other. If none of the above holds true, then it proceeds with a character by character string comparison. (Actually, it is even more complicated than that, because it also checks the hashcodes of the strings, but let's keep things simple in this discussion.)

So, suppose that you are reading tokens from a file in string s, and you have a switch statement of the following form:

switch( s )
{
    case "cat": ....
    case "dog": ....
    case "tod": ....
}

The string literals "cat", "dog", "tod" have all been interned, but you are comparing each and every one of them against s, which has not been interned, so you are not reaping the benefits of the intern pool. If you intern s right before the switch statement, then the comparisons that will be done by the switch statement will be a lot faster.

Of course, if there is any possibility that your file might contain garbage, then you do NOT want to do this, because loading lots of random strings into the intern pool is sure to kill the performance of your program, and eventually run out of memory.

like image 33
Mike Nakis Avatar answered Oct 14 '22 02:10

Mike Nakis