string
in C# is a reference type that behave like value type. Usually programmers do not have to worry about this since strings are immutable and the language design prevents us from doing unintentional dangerous things with them. However, with the use of unsafe pointer logic is it possible to directly manipulate the underlying value of a string, like so:
class Program
{
static string foo = "FOO";
static string bar = "FOO";
const string constFoo = "FOO";
static unsafe void Main(string[] args)
{
fixed (char* p = foo)
{
for (int i = 0; i < foo.Length; i++)
p[i] = 'M';
}
Console.WriteLine($"foo = {foo}"); //MMM
Console.WriteLine($"bar = {bar}"); //MMM
Console.WriteLine($"constFoo = {constFoo}"); //FOO
}
}
When run the compiler will optimize(interning) the strings so that both foo
and bar
points to the same underlying value. By manipulating foo
this way we also change the value of bar
. The const value is inlined by the compiler and is not affected by this. Nothing strange thus far.
Let us change the fixed variable from foo
to constFoo
and we start seeing some strange behaviour.
class Program
{
static string foo = "FOO";
static string bar = "FOO";
const string constFoo = "FOO";
static unsafe void Main(string[] args)
{
fixed (char* p = constFoo)
{
for (int i = 0; i < constFoo.Length; i++)
p[i] = 'M';
}
Console.WriteLine($"foo = {foo}"); //MMM
Console.WriteLine($"bar = {bar}"); //MMM
Console.WriteLine($"constFoo = {constFoo}"); //FOO
}
}
Despite it being constFoo
that we fixed and manipulated it is the value foo
and bar
that are mutated.
Why are foo
and bar
being mutated?
It gets even more strange if we now change the value of foo
and bar
.
class Program
{
static string foo = "BAR";
static string bar = "BAR";
const string constFoo = "FOO";
static unsafe void Main(string[] args)
{
fixed (char* p = constFoo)
{
for (int i = 0; i < constFoo.Length; i++)
p[i] = 'M';
}
Console.WriteLine($"foo = {foo}"); //BAR
Console.WriteLine($"bar = {bar}"); //BAR
Console.WriteLine($"constFoo = {constFoo}"); //FOO
}
}
The code runs and we appear to mutate something somewhere but there are no change to our variables. What are we mutating in this code?
You are modifying the string in the interned string table, as the following code demonstrates:
using System;
namespace CoreApp1
{
class Program
{
const string constFoo = "FOO";
static unsafe void Main(string[] args)
{
fixed (char* p = constFoo)
{
for (int i = 0; i < constFoo.Length; i++)
p[i] = 'M';
}
// Madness ensues: The next line prints "MMM":
Console.WriteLine("FOO"); // Prints the interned value of "FOO" which is now "MMM"
}
}
}
Here's something a little harder to explain:
using System;
using System.Runtime.InteropServices;
namespace CoreApp1
{
class Program
{
const string constFoo = "FOO";
static void Main()
{
char[] chars = new StringToChar {str = constFoo }.chr;
for (int i = 0; i < constFoo.Length; i++)
{
chars[i] = 'M';
Console.WriteLine(chars[i]); // Always prints "M".
}
Console.WriteLine("FOO"); // x86: Prints "MMM". x64: Prints "FOM".
}
}
[StructLayout(LayoutKind.Explicit)]
public struct StringToChar
{
[FieldOffset(0)] public string str;
[FieldOffset(0)] public char[] chr;
}
}
This doesn't use any unsafe code, but it still mutates the string in the intern table.
What's harder to explain here is that for x86 the interned string is changed to "MMM" as you'd expect, but for x64 it gets changed to "FOM". What happened to the changes to the first two characters? I can't explain this, but I'm guessing it's to do with fitting two characters into a word for x64 rather than just one.
To help you understand this, you can decompile the assembly and inspect the IL code.
Taking your second snippet, you will get something like this:
// static fields initialization
.method specialname static void .cctor () cil managed
{
IL_0000: ldstr "FOO"
IL_0005: stsfld string Program::foo
IL_000a: ldstr "FOO"
IL_000f: stsfld string Program::bar
}
.method static void Main() cil managed
{
.entrypoint
.locals init (
[0] char* p,
[1] string pinned,
// ...
)
// fixed (char* ptr = "FOO")
IL_0001: ldstr "FOO"
IL_0006: stloc.1
IL_0007: ldloc.1
IL_0008: conv.u
IL_0009: stloc.0
// ...
}
Note that in all three cases, the string is loaded onto the evaluation stack using the ldstr
opcode.
From the documentation:
The Common Language Infrastructure (CLI) guarantees that the result of two ldstr instructions referring to two metadata tokens that have the same sequence of characters return precisely the same string object (a process known as "string interning").
So in all three cases, you get the same string object - the interned string instance. This explains the "mutated" const
object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With