Can you change the contents of a (immutable) string via an unsafe method?

Tags:

c#

I know that strings are immutable and any changes to a string simply creates a new string in memory (and marks the old one as free). However, I'm wondering if my logic below is sound in that you actually can, in a round-a-bout fashion, modify the contents of a string.

const string baseString = "The quick brown fox jumps over the lazy dog!";

//initialize a new string
string candidateString = new string('\0', baseString.Length);

//Pin the string
GCHandle gcHandle = GCHandle.Alloc(candidateString, GCHandleType.Pinned);

//Copy the contents of the base string to the candidate string
unsafe
{
    char* cCandidateString = (char*) gcHandle.AddrOfPinnedObject();
    for (int i = 0; i < baseString.Length; i++)
    {
        cCandidateString[i] = baseString[i];
    }
}

Does this approach indeed change the contents candidateString (without creating a new candidateString in memory) or does the runtime see through my tricks and treat it as a normal string?

495

asked Sep 08 '15 18:09

Brian Mitchell

1 Answers

Your example works just fine, thanks to several elements:

candidateString lives in the managed heap, so it's safe to modify. Compare this with baseString, which is interned. If you try to modify the interned string, unexpected things may happen. There's no guarantee that string won't live in write-protected memory at some point, although it seems to work today. That would be pretty similar to assigning a constant string to a char* variable in C and then modifying it. In C, that's undefined behavior.
You preallocate enough space in candidateString - so you're not overflowing the buffer.

Character data is not stored at offset 0 of the String class. It's stored at an offset equal to RuntimeHelpers.OffsetToStringData.

public static int OffsetToStringData
{
    // This offset is baked in by string indexer intrinsic, so there is no harm
    // in getting it baked in here as well.
    [System.Runtime.Versioning.NonVersionable] 
    get {
        // Number of bytes from the address pointed to by a reference to
        // a String to the first 16-bit character in the String.  Skip 
        // over the MethodTable pointer, & String 
        // length.  Of course, the String reference points to the memory 
        // after the sync block, so don't count that.  
        // This property allows C#'s fixed statement to work on Strings.
        // On 64 bit platforms, this should be 12 (8+4) and on 32 bit 8 (4+4).
#if WIN32
        return 8;
#else
        return 12;
#endif // WIN32
    }
}

Except...

GCHandle.AddrOfPinnedObject is special cased for two types: string and array types. Instead of returning the address of the object itself, it lies and returns the offset to the data. See the source code in CoreCLR.

// Get the address of a pinned object referenced by the supplied pinned
// handle.  This routine assumes the handle is pinned and does not check.
FCIMPL1(LPVOID, MarshalNative::GCHandleInternalAddrOfPinnedObject, OBJECTHANDLE handle)
{
    FCALL_CONTRACT;

    LPVOID p;
    OBJECTREF objRef = ObjectFromHandle(handle);

    if (objRef == NULL)
    {
        p = NULL;
    }
    else
    {
        // Get the interior pointer for the supported pinned types.
        if (objRef->GetMethodTable() == g_pStringClass)
            p = ((*(StringObject **)&objRef))->GetBuffer();
        else if (objRef->GetMethodTable()->IsArray())
            p = (*((ArrayBase**)&objRef))->GetDataPtr();
        else
            p = objRef->GetData();
    }

    return p;
}
FCIMPLEND

In summary, the runtime lets you play with its data and doesn't complain. You're using unsafe code after all. I've seen worse runtime messing than that, including creating reference types on the stack ;-)

Just remember to add one additional \0 after all the characters (at offset Length) if your final string is shorter than what's allocated. This won't overflow, each string has an implicit null character at the end to ease interop scenarios.

Now take a look at how StringBuilder creates a string, here's StringBuilder.ToString:

[System.Security.SecuritySafeCritical]  // auto-generated
public override String ToString() {
    Contract.Ensures(Contract.Result<String>() != null);

    VerifyClassInvariant();

    if (Length == 0)
        return String.Empty;

    string ret = string.FastAllocateString(Length);
    StringBuilder chunk = this;
    unsafe {
        fixed (char* destinationPtr = ret)
        {
            do
            {
                if (chunk.m_ChunkLength > 0)
                {
                    // Copy these into local variables so that they are stable even in the presence of race conditions
                    char[] sourceArray = chunk.m_ChunkChars;
                    int chunkOffset = chunk.m_ChunkOffset;
                    int chunkLength = chunk.m_ChunkLength;

                    // Check that we will not overrun our boundaries. 
                    if ((uint)(chunkLength + chunkOffset) <= ret.Length && (uint)chunkLength <= (uint)sourceArray.Length)
                    {
                        fixed (char* sourcePtr = sourceArray)
                            string.wstrcpy(destinationPtr + chunkOffset, sourcePtr, chunkLength);
                    }
                    else
                    {
                        throw new ArgumentOutOfRangeException("chunkLength", Environment.GetResourceString("ArgumentOutOfRange_Index"));
                    }
                }
                chunk = chunk.m_ChunkPrevious;
            } while (chunk != null);
        }
    }
    return ret;
}

Yes, it uses unsafe code, and yes, you can optimize yours by using fixed, as this type of pinning is much more lightweight than allocating a GC handle:

const string baseString = "The quick brown fox jumps over the lazy dog!";

//initialize a new string
string candidateString = new string('\0', baseString.Length);

//Copy the contents of the base string to the candidate string
unsafe
{
    fixed (char* cCandidateString = candidateString)
    {
        for (int i = 0; i < baseString.Length; i++)
            cCandidateString[i] = baseString[i];
    }
}

When you use fixed, the GC only discovers an object needs to be pinned when it stumbles upon it during a collection. If there's no collection going on, the GC isn't even involved. When you use GCHandle, a handle is registered in the GC each time.

162

answered Oct 04 '22 20:10

Lucas Trzesniewski

Related questions
                            
                                Finding fastest path at additional condition
                            
                                IDisposable created within a method and returned
                            
                                How do You Add Radio Buttons To Menu Items?
                            
                                How to create the confirm box in mvc controller?
                            
                                Better way to install IIS7 programmatically
                            
                                populating datagridview with list of objects
                            
                                Set "From" address when using System.Net.Mail.MailMessage?
                            
                                Cookie to Expire when Browser Session Ends
                            
                                Struct memory hack to overlap object reference - Is it possible?
                            
                                Quiet down PostSharp warnings at build without skipping PostSharp
                            
                                Authentication filters in MVC 5
                            
                                B-tree class in C# standard libraries? [closed]
                            
                                Difference between string str and string str=null
                            
                                Hashset memory overhead
                            
                                WebApi POST works without [FromBody]?
                            
                                ASP.NET Web API 2 Async action methods with Task.Run performance
                            
                                How to use a JsonConverter with JToken.ToObject<>() method?
                            
                                Tools to create installers or setup programs in Visual Studio 2015
                            
                                When is "too much" async and await? Should all methods return Task? [closed]
                            
                                Managing Application Insights Cookies

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With