During answering on one of questions I saw 2 examples of LINQ code which should work exactly same. But I was wonder about performance, and found that one code much faster that another code. And I cannot understand why.
I took datastructures from question
public struct Strc
{
public decimal A;
public decimal B;
// more stuff
}
public class CLASS
{
public List<Strc> listStrc = new List<Strc>();
// other stuff
}
then I wrote simple benchmark tests (used benchmarkdotnet library)
UPD I included all tests which was requested
public class TestCases
{
private Dictionary<string, CLASS> dict;
public TestCases()
{
var m = 100;
var n = 100;
dict = Enumerable.Range(0, m)
.Select(x => new CLASS()
{
listStrc = Enumerable.Range(0, n)
.Select(y => new Strc() { A = y % 4, B = y }).ToList()
})
.ToDictionary(x => Guid.NewGuid().ToString(), x => x);
}
Greater than 3 tests
[Benchmark]
public void TestJon_Gt3()
{
var result = dict.Values
.SelectMany(x => x.listStrc)
.Where(ls => ls.A > 3)
.Select(ls => ls.B).ToArray();
}
[Benchmark]
public void TestTym_Gt3()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Where(l => l.A > 3))
.Select(x => x.B).ToArray();
}
[Benchmark]
public void TestDasblinkenlight_Gt3()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Select(v => v))
.Where(l => l.A > 3)
.Select(ls => ls.B).ToArray();
}
[Benchmark]
public void TestIvan_Gt3()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Where(l => l.A > 3).Select(l => l.B))
.ToArray();
}
Return true tests
[Benchmark]
public void TestJon_True()
{
var result = dict.Values
.SelectMany(x => x.listStrc)
.Where(ls => true)
.Select(ls => ls.B).ToArray();
}
[Benchmark]
public void TestTym_True()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Where(l => true))
.Select(x => x.B).ToArray();
}
[Benchmark]
public void TestDasblinkenlight_True()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Select(v => v))
.Where(ls => true)
.Select(ls => ls.B).ToArray();
}
[Benchmark]
public void TestIvan_True()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Where(l => true).Select(l => l.B))
.ToArray();
}
}
I ran those tests
static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<TestCases>();
}
and got results
// * Summary *
BenchmarkDotNet=v0.10.9, OS=Windows 7 SP1 (6.1.7601)
Processor=Intel Core i7-4770 CPU 3.40GHz (Haswell), ProcessorCount=8
Frequency=3312841 Hz, Resolution=301.8557 ns, Timer=TSC
[Host] : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1076.0
DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1076.0
Method | Mean | Error | StdDev |
------------------------- |-----------:|-----------:|-----------:|
TestJon_Gt3 | 655.1 us | 1.3408 us | 1.2542 us |
TestTym_Gt3 | 353.1 us | 12.9535 us | 10.8167 us |
TestDasblinkenlight_Gt3 | 943.9 us | 1.9563 us | 1.7342 us |
TestIvan_Gt3 | 352.6 us | 0.7216 us | 0.6397 us |
TestJon_True | 801.8 us | 2.7194 us | 2.2708 us |
TestTym_True | 1,055.8 us | 3.0912 us | 2.7403 us |
TestDasblinkenlight_True | 1,090.6 us | 2.3084 us | 2.1593 us |
TestIvan_True | 677.7 us | 3.0427 us | 2.8461 us |
// * Hints *
Outliers
TestCases.TestTym_Gt3: Default -> 2 outliers were removed
TestCases.TestDasblinkenlight_Gt3: Default -> 1 outlier was removed
TestCases.TestIvan_Gt3: Default -> 1 outlier was removed
TestCases.TestJon_True: Default -> 2 outliers were removed
TestCases.TestTym_True: Default -> 1 outlier was removed
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
1 us : 1 Microsecond (0.000001 sec)
I tried to change initial data (n and m parameters), but results was stable, TestTym was faster than TestJon each time. And TestIvan is semms fastest from all tests. I just want to understand, why it faster? Or maybe I did smthg wrong during testing?
LINQ syntax is typically less efficient than a foreach loop. It's good to be aware of any performance tradeoff that might occur when you use LINQ to improve the readability of your code. And if you'd like to measure the performance difference, you can use a tool like BenchmarkDotNet to do so.
More importantly: when it comes to querying databases, LINQ is in most cases a significantly more productive querying language than SQL. Compared to SQL, LINQ is simpler, tidier, and higher-level.
The Dynamic LINQ library exposes a set of extension methods on IQueryable corresponding to the standard LINQ methods at Queryable, and which accept strings in a special syntax instead of expression trees.
Since ultimately both expressions filter out all items, the time difference is due to the different number of times an intermediate iterator returns a value in the combined chain of statements.
To understand what is going on consider the implementation of SelectMany
from the reference source, with arguments checking removed:
public static IEnumerable<TResult> SelectMany<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector) {
return SelectManyIterator<TSource, TResult>(source, selector);
}
static IEnumerable<TResult> SelectManyIterator<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector) {
foreach (TSource element in source) {
foreach (TResult subElement in selector(element)) {
yield return subElement;
}
}
}
Select
is implemented with a series of different iterators based on the type of collection being enumerated - WhereSelectArrayIterator
, WhereSelectListIterator
, or WhereSelectEnumerableIterator
.
Your test code generates cases in which A
s are in the range from zero to three, inclusive:
Select(y => new Strc() { A = y % 4, B = y })
// ^^^^^^^^^
Therefore, condition Where(ls => ls.A > 3)
produces no matches.
In the TestJon
example yield return
inside SelectMany
is hit 10,000 times, because everything is selected prior to filtering. After that Select
uses WhereSelectEnumerableIterator
, which finds no matches. The number of times the iterator returns a value in both stages is, therefore, 10,000 + 0 = 10,000.
TestTym
, on the other hand, filters everything out during the first state. SelectMany
gets an IEnumerable
of empty IEnumerable
s, so the combined number of times an iterator returns a value during any of the two stages is 0 + 0 = 0.
I changed conditon in queries to
Where(l => true)
, andTym
is now slower thanJon
. Why?
Now the total number of items returned in both stages is the same, 10,000 + 10,000 = 20,000. Now the difference comes down to the way the nested loop of SelectMany
operates:
foreach (TResult subElement in selector(element)) {
yield return subElement; //^^^^^^^^^^^^^^^^^
}
In in Jon
's case selector(element)
returns List<Strc>
. It looks like foreach
figures this out, and iterates over it with less overhead than in Tym
's case, which constructs and returns new iterator objects.
Adding Select(v => v)
to Jon
eliminates the possibility to apply this optimization, so the results in the second update are within the margin of error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With