Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does string.Replace(string, string) create additional strings?

Tags:

c#

.net

We have a requirement to transform a string containing a date in dd/mm/yyyy format to ddmmyyyy format (In case you want to know why I am storing dates in a string, my software processes bulk transactions files, which is a line based textual file format used by a bank).

And I am currently doing this:

string oldFormat = "01/01/2014";
string newFormat = oldFormat.Replace("/", "");

Sure enough, this converts "01/01/2014" to "01012014". But my question is, does the replace happen in one step, or does it create an intermediate string (e.g.: "0101/2014" or "01/012014")?


Here's the reason why I am asking this:

I am processing transaction files ranging in size from few kilobytes to hundreds of megabytes. So far I have not had a performance/memory problem, because I am still testing with very small files. But when it comes to megabytes I am not sure if I will have problems with these additional strings. I suspect that would be the case because strings are immutable. With millions of records this additional memory consumption will build up considerably.

I am already using StringBuilders for output file creation. And I also know that the discarded strings will be garbage collected (at some point before the end of the time). I was wondering if there is a better, more efficient way of replacing all occurrences of a specific character/substring in a string, that does not additionally create an string.

like image 588
sampathsris Avatar asked Feb 11 '26 14:02

sampathsris


1 Answers

Sure enough, this converts "01/01/2014" to "01012014". But my question is, does the replace happen in one step, or does it create an intermediate string (e.g.: "0101/2014" or "01/012014")?

No, it doesn't create intermediate strings for each replacement. But it does create new string, because, as you already know, strings are immutable.

Why?

There is no reason to a create new string on each replacement - it's very simple to avoid it, and it will give huge performance boost.

If you are very interested, referencesource.microsoft.com and SSCLI2.0 source code will demonstrate this(how-to-see-code-of-method-which-marked-as-methodimploptions-internalcall):

FCIMPL3(Object*, COMString::ReplaceString, StringObject* thisRefUNSAFE, 
          StringObject* oldValueUNSAFE, StringObject* newValueUNSAFE)
{

   // unnecessary code ommited
      while (((index=COMStringBuffer::LocalIndexOfString(thisBuffer,oldBuffer,
             thisLength,oldLength,index))>-1) && (index<=endIndex-oldLength))
    {
        replaceIndex[replaceCount++] = index;
        index+=oldLength;
    }

    if (replaceCount != 0)
    {
        //Calculate the new length of the string and ensure that we have 
        // sufficent room.
        INT64 retValBuffLength = thisLength - 
            ((oldLength - newLength) * (INT64)replaceCount);

        gc.retValString = COMString::NewString((INT32)retValBuffLength);
     // unnecessary code ommited
    }
}

as you can see, retValBuffLength is calculated, which knows the amount of replaceCount's. The real implementation can be a bit different for .NET 4.0(SSCLI 4.0 is not released), but I assure you it's not doing anything silly :-).

I was wondering if there is a better, more efficient way of replacing all occurrences of a specific character/substring in a string, that does not additionally create an string.

Yes. Reusable StringBuilder that has capacity of ~2000 characters. Avoid any memory allocation. This is only true if the the replacement lengths are equal, and can get you a nice performance gain if you're in tight loop.

Before writing anything, run benchmarks with big files, and see if the performance is enough for you. If performance is enough - don't do anything.

like image 72
Erti-Chris Eelmaa Avatar answered Feb 13 '26 05:02

Erti-Chris Eelmaa



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!