Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

foreach + break vs linq FirstOrDefault performance difference

I have two classes that perform date date range data fetching for particular days.

public class IterationLookup<TItem> {     private IList<Item> items = null;      public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector)     {         this.items = items.OrderByDescending(keySelector).ToList();     }      public TItem GetItem(DateTime day)     {         foreach(TItem i in this.items)         {            if (i.IsWithinRange(day))            {                return i;            }         }         return null;     } }   public class LinqLookup<TItem> {     private IList<Item> items = null;      public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector)     {         this.items = items.OrderByDescending(keySelector).ToList();     }      public TItem GetItem(DateTime day)     {         return this.items.FirstOrDefault(i => i.IsWithinRange(day));     } } 

Then I do speed tests which show that Linq version is about 5 times slower. This would make sense when I would store items locally without enumerating them using ToList. This would make it much slower, because with every call to FirstOrDefault, linq would also perform OrderByDescending. But that's not the case so I don't really know what's going on. Linq should perform very similar to iteration.

This is the code excerpt that measures my timings

IList<RangeItem> ranges = GenerateRanges(); // returns List<T>  var iterLookup = new IterationLookup<RangeItems>(ranges, r => r.Id); var linqLookup = new LinqLookup<RangeItems>(ranges, r => r.Id);  Stopwatch timer = new Stopwatch();  timer.Start(); for(int i = 0; i < 1000000; i++) {     iterLookup.GetItem(GetRandomDay()); } timer.Stop(); // display elapsed time  timer.Restart(); for(int i = 0; i < 1000000; i++) {     linqLookup.GetItem(GetRandomDay()); } timer.Stop(); // display elapsed time 

Why do I know it should perform better? Because when I write a very similar code without using these lookup classes, Linq performs very similar to foreach iterations...

// continue from previous code block  // items used by both order as they do in classes as well IList<RangeItem> items = ranges.OrderByDescending(r => r.Id).ToList();  timer.Restart(); for(int i = 0; i < 1000000; i++) {     DateTime day = GetRandomDay();     foreach(RangeItem r in items)     {         if (r.IsWithinRange(day))         {             // RangeItem result = r;             break;         }     } }     timer.Stop(); // display elapsed time  timer.Restart(); for(int i = 0; i < 1000000; i++) {    DateTime day = GetRandomDay();    items.FirstOrDefault(i => i.IsWithinRange(day)); } timer.Stop(); // display elapsed time 

This is by my opinion highly similar code. FirstOrDefault as much as I know also iterate for only as long until it gets to a valid item or until it reaches the end. And this is somehow the same as foreach with break.

But even iteration class performs worse than my simple foreach iteration loop which is also a mystery because all the overhead it has is the call to a method within a class compared to direct access.

Question

What am I doing wrong in my LINQ class that it performs so very slow?
What am I doing wrong in my Iteration class so it performs twice as slow as direct foreach loop?

Which times are being measured?

I do these steps:

  1. Generate ranges (as seen below in results)
  2. Create object instances for IterationLookup, LinqLookup (and my optimized date range class BitCountLookup which is not part of discussion here)
  3. Start timer and execute 1 million lookups on random days within maximum date range (as seen in results) by using previously instantiated IterationLookup class.
  4. Start timer and execute 1 million lookups on random days within maximum date range (as seen in results) by using previously instantiated LinqLookup class.
  5. Start timer and execute 1 million lookups (6 times) using manual foreach+break loops and Linq calls.

As you can see object instantiation is not measured.

Appendix I: Results over million lookups

Ranges displayed in these results don't overlap, which should make both approaches even more similar in case LINQ version doesn't break loop on successful match (which it highly likely does).

 Generated Ranges:  ID Range        000000000111111111122222222223300000000011111111112222222222                 123456789012345678901234567890112345678901234567890123456789 09 22.01.-30.01.                     |-------| 08 14.01.-16.01.             |-| 07 16.02.-19.02.                                              |--| 06 15.01.-17.01.              |-| 05 19.02.-23.02.                                                 |---| 04 01.01.-07.01.|-----| 03 02.01.-10.01. |-------| 02 11.01.-13.01.          |-| 01 16.01.-20.01.               |---| 00 29.01.-06.02.                            |-------|  Lookup classes...  - Iteration: 1028ms - Linq: 4517ms   !!! THIS IS THE PROBLEM !!! - BitCounter: 401ms  Manual loops...  - Iter: 786ms - Linq: 981ms - Iter: 787ms - Linq: 996ms - Iter: 787ms - Linq: 977ms - Iter: 783ms - Linq: 979ms 

Appendix II: GitHub:Gist code to test for yourself

I've put up a Gist so you can get the full code yourself and see what's going on. Create a Console application and copy Program.cs into it an add other files that are part of this gist.

Grab it here.

Appendix III: Final thoughts and measurement tests

The most problematic thing was of course LINQ implementatino that was awfully slow. Turns out that this has all to do with delegate compiler optimization. LukeH provided the best and most usable solution that actually made me try different approaches to this. I've tried various different approaches in the GetItem method (or GetPointData as it's called in Gist):

  1. the usual way that most of developers would do (and is implemented in Gist as well and wasn't updated after results revealed this is not the best way of doing it):

    return this.items.FirstOrDefault(item => item.IsWithinRange(day)); 
  2. by defining a local predicate variable:

    Func<TItem, bool> predicate = item => item.IsWithinRange(day); return this.items.FirstOrDefault(predicate); 
  3. local predicate builder:

    Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d); return this.items.FirstOrDefault(builder(day)); 
  4. local predicate builder and local predicate variable:

    Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d); Func<TItem, bool> predicate = builder(day); return this.items.FirstOrDefault(predicate); 
  5. class level (static or instance) predicate builder:

    return this.items.FirstOrDefault(classLevelBuilder(day)); 
  6. externally defined predicate and provided as method parameter

    public TItem GetItem(Func<TItem, bool> predicate) {     return this.items.FirstOrDefault(predicate); } 

    when executing this method I also took two approaches:

    1. predicate provided directly at method call within for loop:

      for (int i = 0; i < 1000000; i++) {     linqLookup.GetItem(item => item.IsWithinRange(GetRandomDay())); } 
    2. predicate builder defined outside for loop:

      Func<DateTime, Func<Ranger, bool>> builder = d => r => r.IsWithinRange(d); for (int i = 0; i < 1000000; i++) {     linqLookup.GetItem(builder(GetRandomDay())); } 

Results - what performs best

For comparison when using iteration class, it takes it approx. 770ms to execute 1 million lookups on randomly generated ranges.

  1. 3 local predicate builder turns out to be best compiler optimized so it performs almost as fast as usual iterations. 800ms.

  2. 6.2 predicate builder defined outside for loop: 885ms

  3. 6.1 predicate defined within for loop: 1525ms

  4. all others took somewhere between 4200ms - 4360ms and are thus considered unusable.

So whenever you use a predicate in externally frequently callable method, define a builder and execute that. This will yield best results.

The biggest surprise to me about this is that delegates (or predicates) may be this much time consuming.

like image 477
Robert Koritnik Avatar asked Nov 21 '11 15:11

Robert Koritnik


People also ask

Is foreach faster than LINQ?

LINQ syntax is typically less efficient than a foreach loop. It's good to be aware of any performance tradeoff that might occur when you use LINQ to improve the readability of your code.

Is FirstOrDefault slow?

The query is executed when you read from the expression. In this case the FirstOrDefault will execute the whole expression tree, so it's not the FirstOrDefault thats slow.

Is FirstOrDefault faster than first?

Performance. First() and FirstOrDefault() are faster than Single() and SingleOrDefault() because First find the first() match only whereas Single() search element in a whole collection of elements.

Which is faster for or foreach C#?

The forloop is faster than the foreach loop if the array must only be accessed once per iteration.


2 Answers

Sometimes LINQ appears slower because the generation of delegates in a loop (especially a non-obvious loop over method calls) can add time. Instead, you may want to consider moving your finder out of the class to make it more generic (like your key selector is on construction):

public class LinqLookup<TItem, TKey> {     private IList<Item> items = null;      public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector)     {         this.items = items.OrderByDescending(keySelector).ToList();     }      public TItem GetItem(Func<TItem, TKey> selector)     {         return this.items.FirstOrDefault(selector);     } } 

Since you don't use a lambda in your iterative code, this can be a bit of a difference since it has to create the delegate on each pass through the loop. Usually, this time is insignificant for every-day coding, and the time to invoke the delegate is no more expensive than other method calls, it's just the delegate creation in a tight loop that can add that little bit of extra time.

In this case, since the delegate never changes for the class, you can create it outside of the code you are looping through and it would be more efficient.

Update:

Actually, even without any optimization, compiling in release mode on my machine I do not see the 5x difference. I just performed 1,000,000 lookups on an Item that only has a DateTime field, with 5,000 items in the list. Of course, my data, etc, are different, but you can see the times are actually really close when you abstract out the delegate:

iterative : 14279 ms, 0.014279 ms/call

linq w opt : 17400 ms, 0.0174 ms/call

These time differences are very minor and worth the readability and maintainability improvements of using LINQ. I don't see the 5x difference though, which leads me to believe there's something we're not seeing in your test harness.

like image 113
James Michael Hare Avatar answered Sep 23 '22 02:09

James Michael Hare


Further to Gabe's answer, I can confirm that the difference appears to be caused by the cost of re-constructing the delegate for every call to GetPointData.

If I add a single line to the GetPointData method in your IterationRangeLookupSingle class then it slows right down to the same crawling pace as LinqRangeLookupSingle. Try it:

// in IterationRangeLookupSingle<TItem, TKey> public TItem GetPointData(DateTime point) {     // just a single line, this delegate is never used     Func<TItem, bool> dummy = i => i.IsWithinRange(point);      // the rest of the method remains exactly the same as before     // ... } 

(I'm not sure why the compiler and/or jitter can't just ignore the superfluous delegate that I added above. Obviously, the delegate is necessary in your LinqRangeLookupSingle class.)

One possible workaround is to compose the predicate in LinqRangeLookupSingle so that point is passed to it as an argument. This means that the delegate only needs to be constructed once, not every time the GetPointData method is called. For example, the following change will speed up the LINQ version so that it's pretty much comparable with the foreach version:

// in LinqRangeLookupSingle<TItem, TKey> public TItem GetPointData(DateTime point) {     Func<DateTime, Func<TItem, bool>> builder = x => y => y.IsWithinRange(x);     Func<TItem, bool> predicate = builder(point);      return this.items.FirstOrDefault(predicate); } 
like image 20
LukeH Avatar answered Sep 24 '22 02:09

LukeH