Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linq to objects: inner query performance

During answering on one of questions I saw 2 examples of LINQ code which should work exactly same. But I was wonder about performance, and found that one code much faster that another code. And I cannot understand why.

I took datastructures from question

public struct Strc
{
    public decimal A;
    public decimal B;
    // more stuff
}

public class CLASS
{
    public List<Strc> listStrc = new List<Strc>();
    // other stuff
}

then I wrote simple benchmark tests (used benchmarkdotnet library)

UPD I included all tests which was requested

public class TestCases
{
    private Dictionary<string, CLASS> dict;

    public TestCases()
    {
        var m = 100;
        var n = 100;

        dict = Enumerable.Range(0, m)
                .Select(x => new CLASS()
                {
                    listStrc = Enumerable.Range(0, n)
                        .Select(y => new Strc() { A = y % 4, B = y }).ToList()
                })
                .ToDictionary(x => Guid.NewGuid().ToString(), x => x);
    }

Greater than 3 tests

    [Benchmark]
    public void TestJon_Gt3()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc)
            .Where(ls => ls.A > 3)
            .Select(ls => ls.B).ToArray();
    }

    [Benchmark]
    public void TestTym_Gt3()
    {
        var result = dict.Values
                .SelectMany(x => x.listStrc.Where(l => l.A > 3))
                .Select(x => x.B).ToArray();
    }


    [Benchmark]
    public void TestDasblinkenlight_Gt3()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc.Select(v => v))
            .Where(l => l.A > 3)
            .Select(ls => ls.B).ToArray();
    }


    [Benchmark]
    public void TestIvan_Gt3()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc.Where(l => l.A > 3).Select(l => l.B))
            .ToArray();
    }

Return true tests

    [Benchmark]
    public void TestJon_True()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc)
            .Where(ls => true)
            .Select(ls => ls.B).ToArray();
    }

    [Benchmark]
    public void TestTym_True()
    {
        var result = dict.Values
                .SelectMany(x => x.listStrc.Where(l => true))
                .Select(x => x.B).ToArray();
    }

    [Benchmark]
    public void TestDasblinkenlight_True()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc.Select(v => v))
            .Where(ls => true)
            .Select(ls => ls.B).ToArray();
    }


    [Benchmark]
    public void TestIvan_True()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc.Where(l => true).Select(l => l.B))
            .ToArray();
    }
}

I ran those tests

static void Main(string[] args)
{
    var summary = BenchmarkRunner.Run<TestCases>();        
}

and got results

// * Summary *

BenchmarkDotNet=v0.10.9, OS=Windows 7 SP1 (6.1.7601)
Processor=Intel Core i7-4770 CPU 3.40GHz (Haswell), ProcessorCount=8
Frequency=3312841 Hz, Resolution=301.8557 ns, Timer=TSC
  [Host]     : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1076.0
  DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1076.0


                   Method |       Mean |      Error |     StdDev |
------------------------- |-----------:|-----------:|-----------:|
              TestJon_Gt3 |   655.1 us |  1.3408 us |  1.2542 us |
              TestTym_Gt3 |   353.1 us | 12.9535 us | 10.8167 us |
  TestDasblinkenlight_Gt3 |   943.9 us |  1.9563 us |  1.7342 us |
             TestIvan_Gt3 |   352.6 us |  0.7216 us |  0.6397 us |
             TestJon_True |   801.8 us |  2.7194 us |  2.2708 us |
             TestTym_True | 1,055.8 us |  3.0912 us |  2.7403 us |
 TestDasblinkenlight_True | 1,090.6 us |  2.3084 us |  2.1593 us |
            TestIvan_True |   677.7 us |  3.0427 us |  2.8461 us |

// * Hints *
Outliers
  TestCases.TestTym_Gt3: Default             -> 2 outliers were removed
  TestCases.TestDasblinkenlight_Gt3: Default -> 1 outlier  was  removed
  TestCases.TestIvan_Gt3: Default            -> 1 outlier  was  removed
  TestCases.TestJon_True: Default            -> 2 outliers were removed
  TestCases.TestTym_True: Default            -> 1 outlier  was  removed

// * Legends *
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements
  1 us   : 1 Microsecond (0.000001 sec)

I tried to change initial data (n and m parameters), but results was stable, TestTym was faster than TestJon each time. And TestIvan is semms fastest from all tests. I just want to understand, why it faster? Or maybe I did smthg wrong during testing?

like image 703
tym32167 Avatar asked Oct 02 '17 10:10

tym32167


People also ask

Is LINQ good for performance?

LINQ syntax is typically less efficient than a foreach loop. It's good to be aware of any performance tradeoff that might occur when you use LINQ to improve the readability of your code. And if you'd like to measure the performance difference, you can use a tool like BenchmarkDotNet to do so.

Which is faster LINQ or SQL?

More importantly: when it comes to querying databases, LINQ is in most cases a significantly more productive querying language than SQL. Compared to SQL, LINQ is simpler, tidier, and higher-level.

What is Dynamic LINQ?

The Dynamic LINQ library exposes a set of extension methods on IQueryable corresponding to the standard LINQ methods at Queryable, and which accept strings in a special syntax instead of expression trees.


1 Answers

Since ultimately both expressions filter out all items, the time difference is due to the different number of times an intermediate iterator returns a value in the combined chain of statements.

To understand what is going on consider the implementation of SelectMany from the reference source, with arguments checking removed:

public static IEnumerable<TResult> SelectMany<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector) {
    return SelectManyIterator<TSource, TResult>(source, selector);
}
static IEnumerable<TResult> SelectManyIterator<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector) {
    foreach (TSource element in source) {
        foreach (TResult subElement in selector(element)) {
            yield return subElement;
        }
    }
}

Select is implemented with a series of different iterators based on the type of collection being enumerated - WhereSelectArrayIterator, WhereSelectListIterator, or WhereSelectEnumerableIterator.

Your test code generates cases in which As are in the range from zero to three, inclusive:

Select(y => new Strc() { A = y % 4, B = y })
//                       ^^^^^^^^^

Therefore, condition Where(ls => ls.A > 3) produces no matches.

In the TestJon example yield return inside SelectMany is hit 10,000 times, because everything is selected prior to filtering. After that Select uses WhereSelectEnumerableIterator, which finds no matches. The number of times the iterator returns a value in both stages is, therefore, 10,000 + 0 = 10,000.

TestTym, on the other hand, filters everything out during the first state. SelectMany gets an IEnumerable of empty IEnumerables, so the combined number of times an iterator returns a value during any of the two stages is 0 + 0 = 0.

I changed conditon in queries to Where(l => true), and Tym is now slower than Jon. Why?

Now the total number of items returned in both stages is the same, 10,000 + 10,000 = 20,000. Now the difference comes down to the way the nested loop of SelectMany operates:

foreach (TResult subElement in selector(element)) {
    yield return subElement; //^^^^^^^^^^^^^^^^^
}

In in Jon's case selector(element) returns List<Strc>. It looks like foreach figures this out, and iterates over it with less overhead than in Tym's case, which constructs and returns new iterator objects.

Adding Select(v => v) to Jon eliminates the possibility to apply this optimization, so the results in the second update are within the margin of error.

like image 164
Sergey Kalinichenko Avatar answered Oct 09 '22 12:10

Sergey Kalinichenko