Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do string literals get optimised by the compiler?

Does the C# compiler or .NET CLR do any clever memory optimisation of string literals/constants? I could swear I'd heard of the concept of "string internalisation" so that in any two bits of code in a program, the literal "this is a string" would actually refer to the same object (presumably safe, what with strings being immutable?). I can't find any useful reference to it on Google though...

Have I heard this wrong? Don't worry - I'm not doing anything horrible in my code with this information, just want to better my understanding of how it works under the covers.

like image 760
Neil Barnwell Avatar asked Nov 26 '10 15:11

Neil Barnwell


People also ask

How does compiler process string literal?

The compiler scans the source code file, looks for, and stores all occurrences of string literals. It can use a mechanism such as a lookup table to do this. It then runs through the list and assigns the same address to all identical string literals.

How is a string literal stored in the memory?

The characters of a literal string are stored in order at contiguous memory locations. An escape sequence (such as \\ or \") within a string literal counts as a single character. A null character (represented by the \0 escape sequence) is automatically appended to, and marks the end of, each string literal.

Are string literals allocated on the stack?

The string literal will be allocated in data segment. The pointer to it, a , will be allocated on the stack. Therefore, the exact answer to your question is: neither. Stack, data, bss and heap are all different regions of memory.

Are string literals stored in read-only memory?

The string literal is stored in the read-only part of memory by most of the compilers.


1 Answers

EDIT: While I strongly suspect the statement below is true for all C# compiler implementations, I'm not sure it's actually guaranteed in the spec. Section 2.4.4.5 of the spec talks about literals referring to the same string instance, but it doesn't mention other constant string expressions. I suspect this is an oversight in the spec - I'll email Mads and Eric about it.


It's not just string literals. It's any string constant. So for example, consider:

public const string X = "X";
public const string Y = "Y";
public const string XY = "XY";

void Foo()
{
    string z = X + Y;
}

The compiler realises that the concatenation here (for z) is between two constant strings, and so the result is also a constant string. Therefore the initial value of z will be the same reference as the value of XY, because they're compile-time constants with the same value.

EDIT: The reply from Mads and Eric suggested that in the Microsoft C# compiler string constants and string literals are usually treated the same way - but that other implementations may differ.

like image 198
Jon Skeet Avatar answered Sep 21 '22 05:09

Jon Skeet