During application profiling I found that function checking for pattern match is very slow. It is written using LINQ. Simple replacement of this LINQ expression with loop makes really huge difference. What is it? Is LINQ really such a bad thing and works so slow or I misunderstand something? <pre class="prettyprint"><code>private static bool PatternMatch1(byte[] buffer, int position, string pattern) { int i = 0; foreach (char c in pattern) { if (buffer[position + i++] != c) { return false; } } return true; } </code></pre> version 2 with LINQ (suggested by Resharper) <pre class="prettyprint"><code>private static bool PatternMatch2(byte[] buffer, int position, string pattern) { int i = 0; return pattern.All(c => buffer[position + i++] == c); } </code></pre> version 3 with LINQ <pre class="prettyprint"><code>private static bool PatternMatch3(byte[] buffer, int position, string pattern) { return !pattern.Where((t, i) => buffer[position + i] != t).Any(); } </code></pre> version 4 using lambda <pre class="prettyprint"><code>private static bool PatternMatch4(byte[] buffer, int position, string pattern, Func<char, byte, bool> predicate) { int i = 0; foreach (char c in pattern) { if (predicate(c, buffer[position + i++])) { return false; } } return true; } </code></pre> and here is the usage with large buffer <pre class="prettyprint"><code>const int SIZE = 1024 * 1024 * 50; byte[] buffer = new byte[SIZE]; for (int i = 0; i < SIZE - 3; ++i) { if (PatternMatch1(buffer, i, "xxx")) { Console.WriteLine(i); } } </code></pre> Calling <code>PatternMatch2</code> or <code>PatternMatch3</code> is phenomenally slow. It takes about 8 seconds for <code>PatternMatch3</code> and about 4 seconds for <code>PatternMatch2</code>, while calling <code>PatternMatch1</code> takes about 0.6. As far as I understand it is the same code and I see no difference. Any ideas?

Mark Byers and Marco Mp are right about the extra calls overhead. However, there is another reason here: many object allocations, due to a closure. The compiler creates a new object on each iteration, storing the current values of <code>buffer</code>, <code>position</code> and <code>i</code> that the predicate can use. Here is a dotTrace snapshot of <code>PatternMatch2</code> showing this. <code><>c_DisplayClass1</code> is the compiler-generated class. (Note that the numbers are huge because I'm using tracing profiling, which adds overhead. However, it's the same overhead per call, so the overall percentages are correct). <img src="https://i.stack.imgur.com/bZVMM.png" alt="enter image description here">

Well let's take the Where operator. It's implementation is almost(*) like: <pre class="prettyprint"><code>public IEnumerable<T> Where(this IEnumerable<T> input, Func<T, bool> fn) { foreach(T i in input) if (fn(i)) yield return i; } </code></pre> This means, for each loop over the IEnumerable an iterator object is created - note that you have SIZE - n of these allocations because you do those many LINQ queries. Then for each character in the pattern you have: <ol> <li>A function call to the enumerator</li> <li>A function call to the predicate</li> </ol> The second is a call to a delegate which costs roughly double the calling cost of a typical virtual call (where in the first version you have no extra calls apart the array deindexing. If you really want brute performance, you probably want to get as "older-style" code as possible. In this case I would even replace that foreach in method 1 with a plain for (at the very least if it does not serve to optimize, it makes it more readable since you are keeping track of the index anyway). It's also more readable in this case, and it shows that Resharper suggestions are sometimes debatable. (*) I said almost because it uses a proxy method to check that the input enumerable is not null and throw the exception earlier than the time the collection is enumerated -- it's a small detail which does not invalidate what I wrote before, highlighting here for completeness.

LINQ query is slow

Tags:

performance

c#

.net

linq

During application profiling I found that function checking for pattern match is very slow. It is written using LINQ. Simple replacement of this LINQ expression with loop makes really huge difference. What is it? Is LINQ really such a bad thing and works so slow or I misunderstand something?

Click to copy

private static bool PatternMatch1(byte[] buffer, int position, string pattern)
{
    int i = 0;

    foreach (char c in pattern)
    {
        if (buffer[position + i++] != c)
        {
            return false;
        }
    }

    return true;
}

version 2 with LINQ (suggested by Resharper)

Click to copy

private static bool PatternMatch2(byte[] buffer, int position, string pattern)
{
    int i = 0;
    return pattern.All(c => buffer[position + i++] == c);
}

version 3 with LINQ

Click to copy

private static bool PatternMatch3(byte[] buffer, int position, string pattern)
{
    return !pattern.Where((t, i) => buffer[position + i] != t).Any();
}

version 4 using lambda

Click to copy

private static bool PatternMatch4(byte[] buffer, int position, string pattern, Func<char, byte, bool> predicate)
{
    int i = 0;

    foreach (char c in pattern)
    {
        if (predicate(c, buffer[position + i++]))
        {
            return false;
        }
    }

    return true;
}

and here is the usage with large buffer

Click to copy

const int SIZE = 1024 * 1024 * 50;

byte[] buffer = new byte[SIZE];

for (int i = 0; i < SIZE - 3; ++i)
{
    if (PatternMatch1(buffer, i, "xxx"))
    {
        Console.WriteLine(i);
    }
}

Calling PatternMatch2 or PatternMatch3 is phenomenally slow. It takes about 8 seconds for PatternMatch3 and about 4 seconds for PatternMatch2, while calling PatternMatch1 takes about 0.6. As far as I understand it is the same code and I see no difference. Any ideas?

213

asked Oct 10 '12 12:10

axe

4 Answers

Mark Byers and Marco Mp are right about the extra calls overhead. However, there is another reason here: many object allocations, due to a closure.

The compiler creates a new object on each iteration, storing the current values of buffer, position and i that the predicate can use. Here is a dotTrace snapshot of PatternMatch2 showing this. <>c_DisplayClass1 is the compiler-generated class.

(Note that the numbers are huge because I'm using tracing profiling, which adds overhead. However, it's the same overhead per call, so the overall percentages are correct).

enter image description here

114

answered Oct 01 '22 18:10

Julien Lebosquain

Well let's take the Where operator.

It's implementation is almost(*) like:

Click to copy

public IEnumerable<T> Where(this IEnumerable<T> input, Func<T, bool> fn)
{
    foreach(T i in input) 
        if (fn(i))
            yield return i;
}

This means, for each loop over the IEnumerable an iterator object is created - note that you have SIZE - n of these allocations because you do those many LINQ queries.

Then for each character in the pattern you have:

A function call to the enumerator
A function call to the predicate

The second is a call to a delegate which costs roughly double the calling cost of a typical virtual call (where in the first version you have no extra calls apart the array deindexing.

If you really want brute performance, you probably want to get as "older-style" code as possible. In this case I would even replace that foreach in method 1 with a plain for (at the very least if it does not serve to optimize, it makes it more readable since you are keeping track of the index anyway).

It's also more readable in this case, and it shows that Resharper suggestions are sometimes debatable.

(*) I said almost because it uses a proxy method to check that the input enumerable is not null and throw the exception earlier than the time the collection is enumerated -- it's a small detail which does not invalidate what I wrote before, highlighting here for completeness.

answered Oct 01 '22 17:10

Marco Mp

The difference is that the LINQ version has extra function calls. Internally in LINQ your lambda function is called in a loop.

While the extra call might be optimized away by the JIT compiler, it's not guaranteed and it can add a small overhead. In most cases the extra overhead won't matter, but because your function is very simple and is called an extremely large number of times, even a small overhead can quickly add up. In my experience LINQ code generally is a little bit slower than equivalent for loops. That's the performance price you often pay for a more high level syntax.

In this situation, I would recommend sticking with the explicit loop.

While you're optimizing this section of code, you might also want to consider a more efficient searching algorithm. Your algorithm is worst case O(n*m) but better algorithms exist.

answered Oct 01 '22 17:10

Mark Byers

LINQ's main goal when applied to collections is its simplicity. If you want performance, you should avoid LINQ entirely. Also to enumerate an array you just increment an index variable, while LINQ needs to set up an entire enumerator object.

answered Oct 01 '22 16:10

Wormbo

Related questions
                            
                                Is returning a reference from static method thread-safe?
                            
                                How to return a static class instance in c#
                            
                                Simple Injector: Inject same UnitOfWork instance across services of the same graph
                            
                                C# : the close method of Xml.Load(file)
                            
                                First split then join a subset of a string
                            
                                const vs. readonly
                            
                                Add third party dll reference in ssis script component
                            
                                Do you have to call EndInvoke (or define a callback ) for asynchronous method calls even if you don't have anything to do when it returns
                            
                                Static keyword proper usage?
                            
                                How to vertically auto size a winforms datagridview control, so that its rows are always visible
                            
                                Python hmac and C# hmac
                            
                                How to join together all the elements in an IEnumerable of IEnumerables?
                            
                                Convert List<string> to List<KeyValuePair<string, string>> using Linq [duplicate]
                            
                                Is there any need to close a DbConnection if a using clause is used? [duplicate]
                            
                                How to detect if any key is pressed
                            
                                How to debug "The type initializer for 'my class' threw an exception"
                            
                                How to do pattern /shape match / recognition on a board (20x20)?
                            
                                Is it possible to bind an array to DataGridView control?
                            
                                Convert a Generic List to specific type
                            
                                Write in stacktrace

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

LINQ query is slow

Tags:

performance

c#

.net

linq

axe

People also ask

4 Answers

Julien Lebosquain

Marco Mp

Mark Byers

Wormbo

Recent Activity

Donate For Us