Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does deferred LINQ query execution actually work?

Tags:

c#

linq

Recently I faced such question: What numbers will be printed considering the following code:

class Program {     static void Main(string[] args)     {         int[] numbers = { 1, 3, 5, 7, 9 };         int threshold = 6;         var query = from value in numbers where value >= threshold select value;          threshold = 3;         var result = query.ToList();          result.ForEach(Console.WriteLine);         Console.ReadLine();     } } 

Answer: 3, 5, 7, 9

Which was quite surprising to me. I thought that threshold value will be put onto stack at the query construction and later at execution time, that number will be pulled back and used in the condition..which didn't happen.

Another case (numbers is set to null just before execution):

    static void Main(string[] args)     {         int[] numbers = { 1, 3, 5, 7, 9 };         int threshold = 6;         var query = from value in numbers where value >= threshold select value;          threshold = 3;         numbers = null;         var result = query.ToList();         ...     } 

Seems to have no effect on the query. It prints out exactly the same answer as in previous example.

Could anyone help me understand what is really going on behind the scene? Why changing threshold has the impact on the query execution while changing numbers doesn't?

like image 403
michal-mad Avatar asked Nov 19 '17 16:11

michal-mad


People also ask

How is deferred execution achieved?

In Deferred Execution, the query is not executed when declared. It is executed when the query object is iterated over a loop. In Immediate Execution, the query is executed when it is declared.

What is deferred execution and immediate execution in LINQ?

The basic difference between a Deferred execution vs Immediate execution is that Deferred execution of queries produce a sequence of values, whereas Immediate execution of queries return a singleton value and is executed immediately.

What are the benefits of a deferred execution in LINQ?

Benefits of Deferred Execution –It avoids unnecessary query execution and hence improves performance. Query construction and Query execution are decoupled, so we can create the LINQ query in several steps. A deferred execution query is reevaluated when you re-enumerate – hence we always get the latest data.

What is deffered execution?

Deferred execution means that the evaluation of an expression is delayed until its realized value is actually required. It greatly improves performance by avoiding unnecessary execution.


2 Answers

Your query can be written like this in method syntax:

var query = numbers.Where(value => value >= threshold); 

Or:

Func<int, bool> predicate = delegate(value) {     return value >= threshold; } IEnumerable<int> query = numbers.Where(predicate); 

These pieces of code (including your own query in query syntax) are all equivalent.

When you unroll the query like that, you see that predicate is an anonymous method and threshold is a closure in that method. That means it will assume the value at the time of execution. The compiler will generate an actual (non-anonymous) method that will take care of that. The method will not be executed when it's declared, but for each item when query is enumerated (the execution is deferred). Since the enumeration happens after the value of threshold is changed (and threshold is a closure), the new value is used.

When you set numbers to null, you set the reference to nowhere, but the object still exists. The IEnumerable returned by Where (and referenced in query) still references it and it does not matter that the initial reference is null now.

That explains the behavior: numbers and threshold play different roles in the deferred execution. numbers is a reference to the array that is enumerated, while threshold is a local variable, whose scope is ”forwarded“ to the anonymous method.

Extension, part 1: Modification of the closure during the enumeration

You can take your example one step further when you replace the line...

var result = query.ToList(); 

...with:

List<int> result = new List<int>(); foreach(int value in query) {     threshold = 8;     result.Add(value); } 

What you are doing is to change the value of threshold during the iteration of your array. When you hit the body of the loop the first time (when value is 3), you change the threshold to 8, which means the values 5 and 7 will be skipped and the next value to be added to the list is 9. The reason is that the value of threshold will be evaluated again on each iteration and the then valid value will be used. And since the threshold has changed to 8, the numbers 5 and 7 do not evaluate as greater or equal anymore.

Extension, part 2: Entity Framework is different

To make things more complicated, when you use LINQ providers that create a different query from your original and then execute it, things are slightly different. The most common examples are Entity Framework (EF) and LINQ2SQL (now largely superseded by EF). These providers create an SQL query from the original query before the enumeration. Since this time the value of the closure is evaluated only once (it actually is not a closure, because the compiler generates an expression tree and not an anonymous method), changes in threshold during the enumeration have no effect on the result. These changes happen after the query is submitted to the database.

The lesson from this is that you have to be always aware which flavor of LINQ you are using and that some understanding of its inner workings is an advantage.

like image 165
Sefe Avatar answered Oct 13 '22 12:10

Sefe


Easiest is to see what will be generated by compiler. You can use this site: https://sharplab.io

using System.Linq;  public class MyClass {     public void MyMethod()     {         int[] numbers = { 1, 3, 5, 7, 9 };          int threshold = 6;          var query = from value in numbers where value >= threshold select value;          threshold = 3;         numbers = null;          var result = query.ToList();     } } 

And here is the output:

using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using System.Reflection; using System.Runtime.CompilerServices; using System.Runtime.InteropServices; using System.Security; using System.Security.Permissions;  [assembly: AssemblyVersion("0.0.0.0")] [assembly: Debuggable(DebuggableAttribute.DebuggingModes.Default | DebuggableAttribute.DebuggingModes.DisableOptimizations | DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints | DebuggableAttribute.DebuggingModes.EnableEditAndContinue)] [assembly: CompilationRelaxations(8)] [assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)] [assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)] [module: UnverifiableCode] public class MyClass {     [CompilerGenerated]     private sealed class <>c__DisplayClass0_0     {         public int threshold;          internal bool <MyMethod>b__0(int value)         {             return value >= this.threshold;         }     }      public void MyMethod()     {         MyClass.<>c__DisplayClass0_0 <>c__DisplayClass0_ = new MyClass.<>c__DisplayClass0_0();         int[] expr_0D = new int[5];         RuntimeHelpers.InitializeArray(expr_0D, fieldof(<PrivateImplementationDetails>.D603F5B3D40E40D770E3887027E5A6617058C433).FieldHandle);         int[] source = expr_0D;         <>c__DisplayClass0_.threshold = 6;         IEnumerable<int> source2 = source.Where(new Func<int, bool>(<>c__DisplayClass0_.<MyMethod>b__0));         <>c__DisplayClass0_.threshold = 3;         List<int> list = source2.ToList<int>();     } } [CompilerGenerated] internal sealed class <PrivateImplementationDetails> {     [StructLayout(LayoutKind.Explicit, Pack = 1, Size = 20)]     private struct __StaticArrayInitTypeSize=20     {     }      internal static readonly <PrivateImplementationDetails>.__StaticArrayInitTypeSize=20 D603F5B3D40E40D770E3887027E5A6617058C433 = bytearray(1, 0, 0, 0, 3, 0, 0, 0, 5, 0, 0, 0, 7, 0, 0, 0, 9, 0, 0, 0); } 

As you can see, if you change threshold variable, you really changes field in auto-generated class. Because you can execute query at any time, it is not possible to have reference to field which lives on the stack - because when you exit method, threshold will be removed from the stack - so compiler changes this field into auto-generated class with field of the same type.

And second problem: why null works (it is not visible in this code)

When you use: source.Where it calls this extension method:

   public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {         if (source == null) throw Error.ArgumentNull("source");         if (predicate == null) throw Error.ArgumentNull("predicate");         if (source is Iterator<TSource>) return ((Iterator<TSource>)source).Where(predicate);         if (source is TSource[]) return new WhereArrayIterator<TSource>((TSource[])source, predicate);         if (source is List<TSource>) return new WhereListIterator<TSource>((List<TSource>)source, predicate);         return new WhereEnumerableIterator<TSource>(source, predicate);     } 

As you can see, it passes reference to:

WhereEnumerableIterator<TSource>(source, predicate); 

And here is source code for where iterator:

    class WhereEnumerableIterator<TSource> : Iterator<TSource>     {         IEnumerable<TSource> source;         Func<TSource, bool> predicate;         IEnumerator<TSource> enumerator;          public WhereEnumerableIterator(IEnumerable<TSource> source, Func<TSource, bool> predicate) {             this.source = source;             this.predicate = predicate;         }          public override Iterator<TSource> Clone() {             return new WhereEnumerableIterator<TSource>(source, predicate);         }          public override void Dispose() {             if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose();             enumerator = null;             base.Dispose();         }          public override bool MoveNext() {             switch (state) {                 case 1:                     enumerator = source.GetEnumerator();                     state = 2;                     goto case 2;                 case 2:                     while (enumerator.MoveNext()) {                         TSource item = enumerator.Current;                         if (predicate(item)) {                             current = item;                             return true;                         }                     }                     Dispose();                     break;             }             return false;         }          public override IEnumerable<TResult> Select<TResult>(Func<TSource, TResult> selector) {             return new WhereSelectEnumerableIterator<TSource, TResult>(source, predicate, selector);         }          public override IEnumerable<TSource> Where(Func<TSource, bool> predicate) {             return new WhereEnumerableIterator<TSource>(source, CombinePredicates(this.predicate, predicate));         }     } 

So it just simply keeps reference to our source object in private field.

like image 22
apocalypse Avatar answered Oct 13 '22 13:10

apocalypse