Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Surprising or wrong benchmarks of Where(predicate).FirstOrDefault() vs FirstOrDefault(predicate)?

I know this question has been asked a lot by people and some even say

So, first(FirstOrDefault(predicate)) one is better in terms of performance1

and I get it, I also think that one more method call should be slightly slower, its just intuition that I have. Nevertheless, I decided to run some benchmarks in order to prove my right and boy little did I know.

This is what I had as results of running my benchmarks:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-3630QM CPU 2.40GHz (Ivy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  Job-XMZTSC : .NET Framework 4.8 (4.8.4121.0), X64 RyuJIT
Runtime=.NET 4.7.2  

Method                            N      Mean         Error      StdDev    Ratio    RatioSD
WherePlusFirstOrDefaultArray    10000   31.44 us    0.288 us    0.270 us    0.40    0.00
FirstOrDefaultArray             10000   78.47 us    0.679 us    0.635 us    1.00    0.00
WherePlusFirstOrDefaultList     10000   54.27 us    1.070 us    1.099 us    0.69    0.02
FirstOrDefaultList              10000   100.84 us   1.722 us    1.611 us    1.29    0.02
WherePlusFirstOrDefaultArray    100000  325.41 us   4.840 us    4.527 us    0.39    0.01
FirstOrDefaultArray             100000  829.85 us   16.513 us   15.446 us   1.00    0.00
WherePlusFirstOrDefaultList     100000  558.10 us   5.507 us    5.151 us    0.67    0.01
FirstOrDefaultList              100000  1,026.93 us 17.648 us   16.508 us   1.24    0.02
WherePlusFirstOrDefaultArray    1000000 3,255.46 us 9.615 us    7.507 us    0.40    0.01
FirstOrDefaultArray             1000000 8,134.15 us 108.425 us  101.420 us  1.00    0.00
WherePlusFirstOrDefaultList     1000000 5,477.63 us 70.584 us   66.024 us   0.67    0.01
FirstOrDefaultList              1000000 9,987.54 us 64.239 us   60.089 us   1.23    0.02

Not only Where(predicate).FirstOrDefault() was faster, but at what margin.

This is my benchmark code using BenchmarkDotNet

[SimpleJob(RuntimeMoniker.Net472)]
public class Benchmarks
{
    private int[] _array;
    private List<int> _list;

    [Params(10000, 100000, 1000000)]
    public int N;

    [GlobalSetup]
    public void Setup()
    {
        _array = new int[N];
        _list = new List<int>(N);

        _array = Enumerable
            .Repeat(0, N - 1).ToArray();
        _list = Enumerable
            .Repeat(0, N - 1).ToList();

        _array[N - 2] = 7;
        _list[N - 2] = 7;
    }

    [Benchmark]
    public int WherePlusFirstOrDefaultArray()
    {
        var seven = _array.Where(n => n == 7).FirstOrDefault();

        return seven;
    }

    [Benchmark(Baseline = true)]
    public int FirstOrDefaultArray()
    {
        var seven = _array.FirstOrDefault(n => n == 7);

        return seven;
    }

    [Benchmark]
    public int WherePlusFirstOrDefaultList()
    {
        var seven = _list.Where(n => n == 7).FirstOrDefault();

        return seven;
    }

    [Benchmark]
    public int FirstOrDefaultList()
    {
        var seven = _list.FirstOrDefault(n => n == 7);

        return seven;
    }
}

Since I was stunned from the results leaving me with no other choice but to ask you guys what am I doing wrong or am I missing something?


EDIT:

I added benchmarks for array vs list structure to the guys thinking it might be because of the List.

EDIT2: The saga continues and I think I am closer to the answer. Adding hardware counter to my benchmark yielded following interesting results:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-3630QM CPU 2.40GHz (Ivy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  Job-ZTIMEH : .NET Framework 4.8 (4.8.4121.0), X64 RyuJIT
Runtime=.NET 4.7.2

Method                             N     Mean        Error      StdDev     Ratio    RatioSD CacheMisses/Op  BranchMispredictions/Op
WherePlusFirstOrDefaultArray    1000000 3.222 ms    0.0224 ms   0.0210 ms    0.39     0.01      885                  327
FirstOrDefaultArray             1000000 8.166 ms    0.1992 ms   0.1863 ms   1.00      0.00     1,795                 810
WherePlusFirstOrDefaultList     1000000 5.564 ms    0.1066 ms   0.1228 ms   0.68      0.02     1,051                 503
FirstOrDefaultList              1000000 10.161 ms   0.1816 ms   0.1699 ms   1.24      0.03     3,451                1,442

For some reason, I can't still explain to myself why FirstOrDefault(predicate) method is yielding 2 to 3 times more branch mispredictions and ofc cache misses than Where(predicate).FirstOrDefault(), surely this has to play a bit part in the results I am observing previously.

Also, one curious thing, if you look at FirstOrDefaultArray and FirstOrDefaultList results and compare them you will see that list is 24% slower, but assemblies generated by these methods are identical to me: https://www.diffchecker.com/WSjAQlet (I stripped memory addresses of the instructions.)

like image 886
kuskmen Avatar asked Feb 19 '20 22:02

kuskmen


People also ask

Which is better SingleOrDefault or FirstOrDefault?

When you want a default value is returned if the result set contains no record, use SingleOrDefault. When you always want one record no matter what the result set contains, use First or FirstOrDefault. When you want a default value if the result set contains no record, use FirstOrDefault.

What is the difference between FirstOrDefault () and SingleOrDefault () extension method in Linq?

SingleOrDefault() Vs. FirstOrDefault() in LINQ Query SingleOrDefault() – Same as Single(), but it can handle the null value. First() - There is at least one result, an exception is thrown if no result is returned. FirstOrDefault() - Same as First(), but not thrown any exception or return null when there is no result.

What is the difference between FirstOrDefault and find?

The key thing to notice here is that FirstOrDefault is called on Enumerable whereas Find is called as a method on the source list. So it's iterating over an array of items which makes sense, since a list is a wrapper on an array.

Is FirstOrDefault slow?

We use Redgate's Performance profiler to find some performance leaks. Our tool uses Linq to objects in several methods. But we have noticed that a FirstOrDefault takes very long on collections with +/- 1000 objects. The profiler also alerts that the query is very slow.

What is the difference between firstordefault() and dictionarykey()?

I hope now you know that retrieving the data by a DictionaryKey is very very faster than FirstOrDefault () LINQ function. Here we see a very huge difference in execution time to retrieve the data in both ways. One of the data was retrieved by FirstOrDefault () and it has taken ~20.02 ms.

What is the difference between find () and firstordefault ()?

He did say about using Find () and FirstOrDefault (condition) in query and Find () will search for the data you had performed something on that object ( add or edit or delete - but not yet saved into the database ) meanwhile FirstOrDefault will only look for what already have been saved Show activity on this post.

Can I use where () and firstordefault () together?

You should not chain both .Where () with .FirstOrDefault () because the .Where () goes through the whole array and then will iterate through that list to find the first item. You save an incredible amount of time by putting your search predicate in the FirstOrDefault () method.

When to use firstordefault() LINQ statement?

It is always recommended to use the Dictionary collection when you are dealing with a large amount of data. FirstOrDefault () LINQ statement is suitable for the small amount of data. Just taking another scenario, if you want to retrieve the following Aadhaar details like 40-50 records and store them in another collection.


1 Answers

The generic Enumerable.Where function maps to different subclasses based on the type of argument. In this case, your argument is a List<int> so you get returned from Where a Enumerable.WhereListIterator<int> that takes a List<int> parameter. It then uses List<T>.GetEnumerator() to enumerator the list, which returns a List<T>.Enumerator struct, which uses an index to index into the List<> and return each member. This is very fast.

FirstOrDefault(Func<> pred) doesn't have this optimization, and uses foreach to traverse the list. While this also ultimately uses the same very fast List<T>.Enumerator, it calls its member methods via the IEnumerable<T> interface, which is considerably slower than calling the List<T>.Enumerator methods directly.

My testing shows that the result is FirstOrDefault(Func<> pred) takes about twice as long per element of the source list. If you write your own FirstOrDefault<T>(List<T> src, Func<T,bool> pred) using either GetEnumerator or foreach, it will run about twice as fast as the built-in FirstOrDefault.

like image 198
NetMage Avatar answered Oct 16 '22 19:10

NetMage