Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory Efficiency and Performance of String.Replace .NET Framework

Tags:

string

c#

.net

 string str1 = "12345ABC...\\...ABC100000";   // Hypothetically huge string of 100000 + Unicode Chars  str1 = str1.Replace("1", string.Empty);  str1 = str1.Replace("22", string.Empty);  str1 = str1.Replace("656", string.Empty);  str1 = str1.Replace("77ABC", string.Empty);   // ...  this replace anti-pattern might happen with upto 50 consecutive lines of code.   str1 = str1.Replace("ABCDEFGHIJD", string.Empty); 

I have inherited some code that does the same as the snippet above. It takes a huge string and replaces (removes) constant smaller strings from the large string.

I believe this is a very memory intensive process given that new large immutable strings are being allocated in memory for each replace, awaiting death via the GC.

1. What is the fastest way of replacing these values, ignoring memory concerns?

2. What is the most memory efficient way of achieving the same result?

I am hoping that these are the same answer!

Practical solutions that fit somewhere in between these goals are also appreciated.

Assumptions:

  • All replacements are constant and known in advance
  • Underlying characters do contain some unicode [non-ascii] chars
like image 798
nick_alot Avatar asked Dec 30 '08 08:12

nick_alot


2 Answers

If you want to be really fast, and I mean really fast you'll have to look beyond the StringBuilder and just write well optimized code.

One thing your computer doesn't like to do is branching, if you can write a replace method which operates on a fixed array (char *) and doesn't branch you have great performance.

What you'll be doing is that the replace operation is going to search for a sequence of characters and if it finds any such sub string it will replace it. In effect you'll copy the string and when doing so, preform the find and replace.

You'll rely on these functions for picking the index of some buffer to read/write. The goal is to preform the replace method such that when nothing has to change you write junk instead of branching.

You should be able to complete this without a single if statement and remember to use unsafe code. Otherwise you'll be paying for index checking for every element access.

unsafe {     fixed( char * p = myStringBuffer )     {         // Do fancy string manipulation here     } } 

I've written code like this in C# for fun and seen significant performance improvements, almost 300% speed up for find and replace. While the .NET BCL (base class library) performs quite well it is riddled with branching constructs and exception handling this will slow down you code if you use the built-in stuff. Also these optimizations while perfectly sound are not preformed by the JIT-compiler and you'll have to run the code as a release build without any debugger attached to be able to observe the massive performance gain.

I could provide you with more complete code but it is a substantial amount of work. However, I can guarantee you that it will be faster than anything else suggested so far.

like image 20
John Leidegren Avatar answered Oct 01 '22 17:10

John Leidegren


All characters in a .NET string are "unicode chars". Do you mean they're non-ascii? That shouldn't make any odds - unless you run into composition issues, e.g. an "e + acute accent" not being replaced when you try to replace an "e acute".

You could try using a regular expression with Regex.Replace, or StringBuilder.Replace. Here's sample code doing the same thing with both:

using System; using System.Text; using System.Text.RegularExpressions;  class Test {     static void Main(string[] args)     {         string original = "abcdefghijkl";          Regex regex = new Regex("a|c|e|g|i|k", RegexOptions.Compiled);          string removedByRegex = regex.Replace(original, "");         string removedByStringBuilder = new StringBuilder(original)             .Replace("a", "")             .Replace("c", "")             .Replace("e", "")             .Replace("g", "")             .Replace("i", "")             .Replace("k", "")             .ToString();          Console.WriteLine(removedByRegex);         Console.WriteLine(removedByStringBuilder);     } } 

I wouldn't like to guess which is more efficient - you'd have to benchmark with your specific application. The regex way may be able to do it all in one pass, but that pass will be relatively CPU-intensive compared with each of the many replaces in StringBuilder.

like image 123
Jon Skeet Avatar answered Oct 01 '22 15:10

Jon Skeet