C# Extension method slower than chained Replace unless in tight loop. Why?

Tags:

I have an extension method to remove certain characters from a string (a phone number) which is performing much slower than I think it should vs chained Replace calls. The weird bit, is that in a loop it overtakes the Replace thing if the loop runs for around 3000 iterations, and after that it's faster. Lower than that and chaining Replace is faster. It's like there's a fixed overhead to my code which Replace doesn't have. What could this be!?

Quick look. When only testing 10 numbers, mine takes about 0.3ms, while Replace takes only 0.01ms. A massive difference! But when running 5 million, mine takes around 1700ms while Replace takes about 2500ms.

Phone numbers will only have 0-9, +, -, (, )

Here's the relevant code: Building test cases, I'm playing with testNums.

        int testNums = 5_000_000;
        Console.WriteLine("Building " + testNums + " tests");
        Random rand = new Random();
        string[] tests = new string[testNums];
        char[] letters =
        {
            '0','1','2','3','4','5','6','7','8','9',
            '+','-','(',')'
        };
        for(int t = 0; t < tests.Length; t++)
        {
            int length = rand.Next(5, 20);
            char[] word = new char[length];
            for(int c = 0; c < word.Length; c++)
            {
                word[c] = letters[rand.Next(letters.Length)];
            }
            tests[t] = new string(word);
        }

        Console.WriteLine("Tests built");
        string[] stripped = new string[tests.Length];

Using my extension method:

        Stopwatch stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < stripped.Length; i++)
        {
            stripped[i] = tests[i].CleanNumberString();
        }
        stopwatch.Stop();
        Console.WriteLine("Clean: " + stopwatch.Elapsed.TotalMilliseconds + "ms");

Using chained Replace:

        stripped = new string[tests.Length];
        stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < stripped.Length; i++)
        {
            stripped[i] = tests[i].Replace(" ", string.Empty)
                        .Replace("-", string.Empty)
                        .Replace("(", string.Empty)
                        .Replace(")", string.Empty)
                        .Replace("+", string.Empty);
        }
        stopwatch.Stop();
        Console.WriteLine("Replace: " + stopwatch.Elapsed.TotalMilliseconds + "ms");

Extension method in question:

    public static string CleanNumberString(this string s)
    {
        Span<char> letters = stackalloc char[s.Length];
        int count = 0;
        for (int i = 0; i < s.Length; i++)
        {
            if (s[i] >= '0' && s[i] <= '9')
                letters[count++] = s[i];
        }
        return new string(letters.Slice(0, count));
    }

What I've tried:

I've run them around the other way. Makes a tiny difference, but not enough.
Make it a normal static method, which was significantly slower than extension. As a ref parameter was slightly slower, and in parameter was about the same as extension method.
Aggressive Inlining. Doesn't make any real difference. I'm in release mode, so I suspect the compiler inlines it anyway. Either way, not much change.

I have also looked at memory allocations, and that's as I expect. My one allocates on the managed heap only one string per iteration (the new string at the end) which Replace allocates a new object for each Replace. So the memory used by the Replace one is much, higher. But it's still faster!

Is it calling native C code and doing something crafty there? Is the higher memory usage triggering the GC and slowing it down (still doesn't explane the insanely fast time on only one or two iterations)

Any ideas?

(Yes, I know not to bother optimising things like this, it's just bugging me because I don't know why it's doing this)

637

asked Feb 25 '19 21:02

craig

1 Answers

After doing some benchmarks, I think can safely assert that your initial statement is wrong for the exact reason you mentionned in your deleted answer: the loading time of the method is the only thing that misguided you.

Here's the full benchmark on a simplified version of the problem:

static void Main(string[] args)
{
    // Build string of n consecutive "ab"
    int n = 1000;
    Console.WriteLine("N: " + n);
    char[] c = new char[n];

    for (int i = 0; i < n; i+=2)
        c[i] = 'a';
    for (int i = 1; i < n; i += 2)
        c[i] = 'b';

    string s = new string(c);

    Stopwatch stopwatch;

    // Make sure everything is loaded
    s.CleanNumberString();
    s.Replace("a", "");
    s.UnsafeRemove();

    // Tests to remove all 'a' from the string

    // Unsafe remove
    stopwatch = Stopwatch.StartNew();

    string a1 = s.UnsafeRemove();

    stopwatch.Stop();
    Console.WriteLine("Unsafe remove:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");

    // Extension method
    stopwatch = Stopwatch.StartNew();

    string a2 = s.CleanNumberString();

    stopwatch.Stop();
    Console.WriteLine("Clean method:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");

    // String replace
    stopwatch = Stopwatch.StartNew();

    string a3 = s.Replace("a", "");

    stopwatch.Stop();
    Console.WriteLine("String.Replace:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");

    // Make sure the returned strings are identical
    Console.WriteLine(a1.Equals(a2) && a2.Equals(a3));

    Console.ReadKey();

}

public static string CleanNumberString(this string s)
{
    char[] letters = new char[s.Length];
    int count = 0;
    for (int i = 0; i < s.Length; i++)
        if (s[i] == 'b')
            letters[count++] = 'b';
    return new string(letters.SubArray(0, count));
}

public static T[] SubArray<T>(this T[] data, int index, int length)
{
    T[] result = new T[length];
    Array.Copy(data, index, result, 0, length);
    return result;
}

// Taken from https://stackoverflow.com/a/2183442/6923568
public static unsafe string UnsafeRemove(this string s)
{
    int len = s.Length;
    char* newChars = stackalloc char[len];
    char* currentChar = newChars;

    for (int i = 0; i < len; ++i)
    {
        char c = s[i];
        switch (c)
        {
            case 'a':
                continue;
            default:
                *currentChar++ = c;
                break;
        }
    }
    return new string(newChars, 0, (int)(currentChar - newChars));
}

When ran with different values of n, it is clear that your extension method (or at least my somewhat equivalent version of it) has a logic that makes it faster than String.Replace(). In fact, it is more performant on either small or big strings:

N: 100
Unsafe remove: 0,0024ms
Clean method: 0,0015ms
String.Replace: 0,0021ms
True

N: 100000
Unsafe remove: 0,3889ms
Clean method: 0,5308ms
String.Replace: 1,3993ms
True

I highly suspect optimizations for the replacement of strings (not to be compared to removal) in String.Replace() to be the culprit here. I also added a method from this answer to have another comparison on removal of characters. That one's times behave similarly to your method but gets faster on higher values (80k+ on my tests) of n.

With all that being said, since your question is based on an assumption that we found was false, if you need more explanation on why the opposite is true (i.e. "Why is String.Replace() slower than my method"), plenty of in-depth benchmarks about string manipulation already do so.

119

answered Sep 30 '22 00:09

Mat

Related questions
                            
                                EF Core Eager Loading nested collections
                            
                                Blazor/Electron : Handling close event
                            
                                Is unused code included in a release assembly?
                            
                                Some projects' dependencies contain unexplained code analyzers
                            
                                What is the conversion parameter of Expression.Coalesce for?
                            
                                Multiple hubs to one connection to prevent god class
                            
                                Can't get any cookies with C# HttpClient
                            
                                Analog of CreatedAtAction for update operation
                            
                                Rider single-click to open file
                            
                                Get ErrorMessage from Response in Netcore 2.2 Web API
                            
                                ASP.NET Core 2 SignalR Hub to receive complex object instead of string
                            
                                What's the difference between two opposite modelBuilder relationship settings?
                            
                                Where should TreeView's Invoke method between two WinForms be handled?
                            
                                How to return ActionResult (file) from class that does not derive from Controller?
                            
                                C#: volatile reads and writes of HashSet
                            
                                page.DataContext not inherited from parent Frame?
                            
                                Does setting an item null in a .NET List<T> make it available for Garbage Collection and is it a good idea?
                            
                                The type or namespace name 'HttpGet' could not be found when add 'System.Web.Http' namespace
                            
                                Difference OnInit and OnLoad in ASP.NET?
                            
                                C# int to Flag Enum [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C# Extension method slower than chained Replace unless in tight loop. Why?

Tags:

c#

optimization

craig

People also ask

1 Answers

Mat

Recent Activity

Donate For Us