Background
The background for this is that I had a recent conversation in the comments with another clearly knowledgeable user about how LINQ is compiled. I first "summarized" and said LINQ was compiled to a for loop. While this isn't correct, my understanding from other stacks such as this one is that the LINQ query is compiled to a lambda with a loop inside of it. This is then called when the variable is enumerated for the first time (after which the results are stored). The other user said that LINQ takes additional optimizations such as hashing. I couldn't find any supporting documentation either for or against this.
I know this seems like a really obscure point but I have always felt that if I don't understand how something works completely, its going to be difficult to understand why I'm not using it correctly.
The Question
So, lets take the following very simple example:
var productNames =
from p in products
where p.Id > 100 and p.Id < 5000
select p.ProductName;
What is this statement actually compiled to in CLR? What optimizations does LINQ take over me just writing a function that manually parses the results? Is this just semantics or is there more to it than that?
Clarification
Clearly I'm asking this question because I don't understand what the inside of the LINQ "black box" looks like. Even though I understand that LINQ is complicated (and powerful), I'm mostly looking for a basic understanding of either the CLR or a functional equivalent to a LINQ statement. There are great sites out there for helping understand how to create a LINQ statement but very few of these seem to give any guidance on how those are actually compiled or run.
Side Note - I will absolutely read through the John Skeet series on linq to objects.
Side Note 2 - I shouldn't have tagged this as LINQ to SQL. I understand how ORM's and micro-ORM's work. That is really besides the point of the question.
For LINQ to Objects, this is compiled into a set of static method calls:
var productNames =
from p in products
where p.Id > 100 and p.Id < 5000
select p.ProductName;
Becomes:
IEnumerable<string> productNames = products
.Where(p => p.Id > 100 and p.Id < 5000)
.Select(p => p.ProductName);
This uses extension methods defined in the Enumerable
type, so is actually compiled to:
IEnumerable<string> productNames =
Enumerable.Select(
Enumerable.Where(products, p => p.Id > 100 and p.Id < 5000),
p => p.ProductName
);
The lambda expressions to handle this are turned into methods by the compiler. The lambda in the where is turned into a method which can be set to a Func<Product, Boolean>
, and the select into a Func<Product, String>
.
For a thorough explanation, see Jon Skeet's blog series: Reimplementing LINQ to Objects. He walks through the entire process of how this works, including the compiler transformations (from query syntax to method calls), how the methods are implemented, etc.
Note that LINQ to Sql and IQueryable<T>
implementations are different. The Expression<T>
that is generated by the lambda is passed into the query provider, which in turn is "transformed" in some manner (it's up to the provider how to do this) into calls, typically run on the server in the case of an ORM.
For this method, for example:
private static IEnumerable<string> ProductNames(IEnumerable<Product> products)
{
var productNames =
from p in products
where p.Id > 100 && p.Id < 5000
select p.ProductName;
return productNames;
}
Gets compiled to the following IL:
.method private hidebysig static class [mscorlib]System.Collections.Generic.IEnumerable`1<string> ProductNames(class [mscorlib]System.Collections.Generic.IEnumerable`1<class ConsoleApplication3.Product> products) cil managed
{
.maxstack 3
.locals init (
[0] class [mscorlib]System.Collections.Generic.IEnumerable`1<string> enumerable,
[1] class [mscorlib]System.Collections.Generic.IEnumerable`1<string> enumerable2)
L_0000: nop
L_0001: ldarg.0
L_0002: ldsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, bool> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate3
L_0007: dup
L_0008: brtrue.s L_001d
L_000a: pop
L_000b: ldnull
L_000c: ldftn bool ConsoleApplication3.Program::<ProductNames>b__2(class ConsoleApplication3.Product)
L_0012: newobj instance void [mscorlib]System.Func`2<class ConsoleApplication3.Product, bool>::.ctor(object, native int)
L_0017: dup
L_0018: stsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, bool> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate3
L_001d: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0> [System.Core]System.Linq.Enumerable::Where<class ConsoleApplication3.Product>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>, class [mscorlib]System.Func`2<!!0, bool>)
L_0022: ldsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, string> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate5
L_0027: dup
L_0028: brtrue.s L_003d
L_002a: pop
L_002b: ldnull
L_002c: ldftn string ConsoleApplication3.Program::<ProductNames>b__4(class ConsoleApplication3.Product)
L_0032: newobj instance void [mscorlib]System.Func`2<class ConsoleApplication3.Product, string>::.ctor(object, native int)
L_0037: dup
L_0038: stsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, string> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate5
L_003d: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!1> [System.Core]System.Linq.Enumerable::Select<class ConsoleApplication3.Product, string>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>, class [mscorlib]System.Func`2<!!0, !!1>)
L_0042: stloc.0
L_0043: ldloc.0
L_0044: stloc.1
L_0045: br.s L_0047
L_0047: ldloc.1
L_0048: ret
}
Note that these are normal call
instructions for the method calls. The lambdas get converted into other methods, such as:
[CompilerGenerated]
private static bool <ProductNames>b__2(Product p)
{
return ((p.Id > 100) && (p.Id < 0x1388));
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With