Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should .NET strings really be considered immutable?

Consider the following code:

unsafe
{
    string foo = string.Copy("This can't change");

    fixed (char* ptr = foo)
    {
        char* pFoo = ptr;
        pFoo[8] = pFoo[9] = ' ';
    }

    Console.WriteLine(foo); // "This can   change"
}

This creates a pointer to the first character of foo, reassigns it to become mutable, and changes the chars 8 and 9 positions up to ' '.

Notice I never actually reassigned foo; instead, I changed its value by modifying its state, or mutating the string. Therefore, .NET strings are mutable.

This works so well, in fact, that the following code:

unsafe
{
    string bar = "Watch this";

    fixed (char* p = bar)
    {
        char* pBar = p;
        pBar[0] = 'C';
    }

    string baz = "Watch this";
    Console.WriteLine(baz); // Unrelated, right?
}

will print "Catch this" due to string literal interning.

This has plenty of applicable uses, for example this:

string GetForInputData(byte[] inputData)
{
    // allocate a mutable buffer...
    char[] buffer = new char[inputData.Length];

    // fill the buffer with input data

    // ...and a string to return
    return new string(buffer);
}

gets replaced by:

string GetForInputData(byte[] inputData)
{
    // allocate a string to return
    string result = new string('\0', inputData.Length);

    fixed (char* ptr = result)
    {
        // fill the result with input data
    }

    return result; // return it
}

This could save potentially huge memory allocation / performance costs if you work in a speed-critical field (e.g. encodings).

I guess you could say that this doesn't count because it "uses a hack" to make pointers mutable, but then again it was the C# language designers who supported assigning a string to a pointer in the first place. (In fact, this is done all the time internally in String and StringBuilder, so technically you could make your own StringBuilder with this.)

So, should .NET strings really be considered immutable?

like image 536
James Ko Avatar asked Aug 10 '15 23:08

James Ko


2 Answers

§ 18.6 of the C# language specification (The fixed statement) specifically addresses the case of modifying a string through a fixed pointer, and indicates that doing so can result in undefined behavior:

Modifying objects of managed type through fixed pointers can results in undefined behavior. For example, because strings are immutable, it is the programmer’s responsibility to ensure that the characters referenced by a pointer to a fixed string are not modified.

like image 54
drf Avatar answered Nov 16 '22 21:11

drf


I just had to play with this and experiment to confirm whether the addresses of string literal are pointing into the same memory location.

The results are:

string foo = "Fix value?"; //New address: 0x02b215f8
string foo2 = "Fix value?"; //Points to same address: 0x02b215f8
string fooCopy = string.Copy(foo); //New address: 0x021b2888

fixed (char* p = foo)
{
    p[9] = '!';
}

Console.WriteLine(foo);
Console.WriteLine(foo2);
Console.WriteLine(fooCopy);

//Reference is equal, which means refering to same memory address
Console.WriteLine(string.ReferenceEquals(foo, foo2)); //true

//Reference is not equal, which creates another string in new memory address
Console.WriteLine(string.ReferenceEquals(foo, fooCopy)); //false

We see that foo initializes a string literal which points to 0x02b215f8 memory address in my PC. Assigning the same string literal to foo2 references the same memory address. And creating a copy of that same string literal makes a new one. Further testing via string.ReferenceEquals() reveals that they are indeed equal for foo and foo2 while different reference for foo and fooCopy.

It is interesting to see how string literals can be manipulated in memory and affects other variables that are just referencing it. One of the things that we should be careful of as this behavior exists.

like image 1
Joel Legaspi Enriquez Avatar answered Nov 16 '22 21:11

Joel Legaspi Enriquez