Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens during string initialization?

Tags:

c#

What happens exactly during string initialization?

string s = "Hello World!";

Is it going to make a call to any of these constructors?

public String(char* value);
public String(char[] value);
like image 452
Raj Karri Avatar asked Jun 26 '15 23:06

Raj Karri


2 Answers

I took a look at the CoreCLR repository to see what ldstr (see Filip Bulovic's answer) does under the hood and found the path to be something like this:

  1. [vm/interpreter.cpp] The interpreter's IL evaluation loop hits case CEE_LDSTR and calls Interpreter::LdStr()
  2. LdStr() calls ConstructStringLiteral and passes the module of the current method and the string pointer (current IL instruction location + 1)
  3. [vm/jithelpers.cpp] ConstructStringLiteral calls Module::ResolveStringRef
  4. [vm/ceeload.cpp] ResolveStringRef calls InitializeStringData, then LoaderAllocator::GetStringObjRefPtrFromUnicodeString
  5. [vm/loaderallocator.cpp] GetStringObjRefPtrFromUnicodeString calls the LoaderAllocator specific string literal map's GetStringLiteral
  6. [vm/stringliteralmap.cpp] GetStringLiteral hashes the string and attempts to get the string object from the local string entry hash table. If found, the string object from the hash table is returned. If not, then an attempt is made to get the string object from the global string literal map. If the literal isn't found from the global map, it's added to the global map with GlobalStringLiteralMap::AddStringLiteral.
  7. AddStringLiteral creates the COM+ string object with a call to AllocateStringObject, allocates an object handle for it, and adds the literal to the table as the key and the object as the value.
    • AllocateStringObject: the char count is counted, a call is made to the garbage collector to allocate a string of that size, the string constant is copied to the COM+ string object, the string object is tested with GetIsOnlyLowChars, if true, the flag STRING_STATE_FAST_OPS is set in the COM+ string object "...which indicates if the string can be sorted in a fast way. The flag is persisted to assembly containing the string literals. We restore the flag when we load strings from assembly..."
      • GetIsOnlyLowChars does a bitwise AND with ONLY_LOW_CHARS_MASK (which is 0x80000000) and the characters in the string and returns true if the string contains only characters less than 0x80. The managed (internal) methods String.IsFastSort()ref|src and String.IsAscii()ref|src make use of this.
  8. All the way back in [vm/interpreter.cpp]: The string object handle get pushed to the stack.

In conclusion, string literals take quite a specific path which doesn't make any calls to the managed String(char *) or String(char[]) constructor. However, I haven't found the implementation for those constructors yet so I can only assume that they both make a call to AllocateStringObject at some point.

I hope this answer fits your idea of "exactly".

like image 127
cbr Avatar answered Nov 09 '22 23:11

cbr


Here is C#:

public static void Main (string[] args)
{
    string hello = "Hello World!";
    Console.WriteLine (hello);
}

and here is IL:

// method line 2
.method public static hidebysig 
       default void Main (string[] args)  cil managed 
{
    // Method begins at RVA 0x20f4
.entrypoint
// Code size 13 (0xd)
.maxstack 2
.locals init (
    string  V_0)
IL_0000:  ldstr "Hello World!"
IL_0005:  stloc.0 
IL_0006:  ldloc.0 
IL_0007:  call void class [mscorlib]System.Console::WriteLine(string)
IL_000c:  ret 
} // end of method MainClass::Main

The ldstr instruction pushes an object reference (type O) to a new string object representing the specific string literal stored in the metadata. The ldstr instruction allocates the requisite amount of memory and performs any format conversion required to convert the string literal from the form used in the file to the string format required at runtime.

like image 39
Filip Bulovic Avatar answered Nov 09 '22 21:11

Filip Bulovic