Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linq + foreach loop optimization

So I recently found myself writing a loop similar to this one:

        var headers = new Dictionary<string, string>();
        ...
        foreach (var header in headers)
        {
            if (String.IsNullOrEmpty(header.Value)) continue;
            ...
        }

Which works fine, it iterates through the dictionary once and does all I need it to do. However, my IDE is suggesting this as a more readable / optimized alternative, but I disagree:

        var headers = new Dictionary<string, string>();
        ...
        foreach (var header in headers.Where(header => !String.IsNullOrEmpty(header.Value)))
        {
            ...
        }

But wont that iterate through the dictionary twice? Once to evaluate the .Where(...) and then once for the for-each loop?

If not, and the second code example only iterates the dictionary once, please explain why and how.

like image 708
caesay Avatar asked Jun 08 '12 03:06

caesay


1 Answers

The code with continue is about twice as fast.

I ran the following code in LINQPad, and the results consistently say that the clause with continue is twice as fast.

void Main()
{
    var headers = Enumerable.Range(1,1000).ToDictionary(i => "K"+i,i=> i % 2 == 0 ? null : "V"+i);
    var stopwatch = new Stopwatch(); 
    var sb = new StringBuilder();

    stopwatch.Start();

    foreach (var header in headers.Where(header => !String.IsNullOrEmpty(header.Value)))
        sb.Append(header);
    stopwatch.Stop();
    Console.WriteLine("Using LINQ : " + stopwatch.ElapsedTicks);

    sb.Clear();
    stopwatch.Reset();

    stopwatch.Start();
    foreach (var header in headers)
    {
        if (String.IsNullOrEmpty(header.Value)) continue;
        sb.Append(header);
    }
    stopwatch.Stop();

    Console.WriteLine("Using continue : " + stopwatch.ElapsedTicks);

}

Here are some of the results I got

Using LINQ : 1077
Using continue : 348

Using LINQ : 939
Using continue : 459

Using LINQ : 768
Using continue : 382

Using LINQ : 1256
Using continue : 457

Using LINQ : 875
Using continue : 318

In general LINQ is always going to be slower when working with an already evaluated IEnumerable<T>, than the foreach counterpart. The reason is that LINQ-to-Objects is just a high-level wrapper of these lower level language features. The benefit to using LINQ here is not performance, but the provision of a consistent interface. LINQ absolutely does provide performance benefits, but they come into play when you are working with resources that are not already in active memory (and allow you to leverage the ability to optimize the code that is actually executed). When the alternative code is the most optimal alternative, then LINQ just has to go through a redundant process to call the same code you would have written anyway. To illustrate this, I'm going to paste the code below that is actually called when you use LINQ's Where operator on a loaded enumerable:

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    if (predicate == null)
    {
        throw Error.ArgumentNull("predicate");
    }
    if (source is Iterator<TSource>)
    {
        return ((Iterator<TSource>) source).Where(predicate);
    }
    if (source is TSource[])
    {
        return new WhereArrayIterator<TSource>((TSource[]) source, predicate);
    }
    if (source is List<TSource>)
    {
        return new WhereListIterator<TSource>((List<TSource>) source, predicate);
    }
    return new WhereEnumerableIterator<TSource>(source, predicate);
}

And here is the WhereSelectEnumerableIterator<TSource,TResult> class. The predicate field is the delegate that you pass into the Where() method. You will see where it actually gets executed in the MoveNext method (as well as all the redundant null checks). You will also see that the enumerable is only looped through once. Stacking where clauses will result in the creation of multiple iterator classes (wrapping their predecessors), but will not result in multiple enumeration actions (due to deferred execution). Keep in mind that when you write a Lambda like this, you are also actually creating a new Delegate instance (also affecting your performance in a minor way).

private class WhereSelectEnumerableIterator<TSource, TResult> : Enumerable.Iterator<TResult>
{
    private IEnumerator<TSource> enumerator;
    private Func<TSource, bool> predicate;
    private Func<TSource, TResult> selector;
    private IEnumerable<TSource> source;

    public WhereSelectEnumerableIterator(IEnumerable<TSource> source, Func<TSource, bool> predicate, Func<TSource, TResult> selector)
    {
        this.source = source;
        this.predicate = predicate;
        this.selector = selector;
    }

    public override Enumerable.Iterator<TResult> Clone()
    {
        return new Enumerable.WhereSelectEnumerableIterator<TSource, TResult>(this.source, this.predicate, this.selector);
    }

    public override void Dispose()
    {
        if (this.enumerator != null)
        {
            this.enumerator.Dispose();
        }
        this.enumerator = null;
        base.Dispose();
    }

    public override bool MoveNext()
    {
        switch (base.state)
        {
            case 1:
                this.enumerator = this.source.GetEnumerator();
                base.state = 2;
                break;

            case 2:
                break;

            default:
                goto Label_007C;
        }
        while (this.enumerator.MoveNext())
        {
            TSource current = this.enumerator.Current;
            if ((this.predicate == null) || this.predicate(current))
            {
                base.current = this.selector(current);
                return true;
            }
        }
        this.Dispose();
    Label_007C:
        return false;
    }

    public override IEnumerable<TResult2> Select<TResult2>(Func<TResult, TResult2> selector)
    {
        return new Enumerable.WhereSelectEnumerableIterator<TSource, TResult2>(this.source, this.predicate, Enumerable.CombineSelectors<TSource, TResult, TResult2>(this.selector, selector));
    }

    public override IEnumerable<TResult> Where(Func<TResult, bool> predicate)
    {
        return (IEnumerable<TResult>) new Enumerable.WhereEnumerableIterator<TResult>(this, predicate);
    }
}

I personally think the performance difference is completely justifiable, because LINQ code is much easier to maintain and reuse. I also do things to offset the performance issues (like declaring all my anonymous lambda delegates and expressions as static readonly fields in a common class). But in reference to your actual question, your continue clause is definitely faster than the LINQ alternative.

like image 184
smartcaveman Avatar answered Oct 09 '22 14:10

smartcaveman