Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In a Linq predicate, will the compiler optimize a "scalar" call to Enumerable.Min() or will it be called for each item?

I was just looking at the question "SubQuery using Lambda Expression" and wondered about compiler optimization of Linq predicates.

Suppose I had a List<string> called names, and I was looking for the items with the shortest string length. So we have the query names.Where(x => x.Length == names.Min(y => y.Length)) (from the question mentioned above). Simple enough.

Now, we know the C# specification does not allow you to modify a collection while enumerating it. So I believe it is technically safe to assume the above call to Min() will always return the same value for every call.

But, my hypothesis is the compiler truly has no way of knowing what the lambda inside the Enumerable.Min extension method returns. Since, for example we could do:

int i = 0;
return names.Where(x => x.Length == names.Min(y => ++i));

Which would mean the query in question is really O(n²) - the result of Min() will be calculated for each iteration. And to get the desired O(n) implementation, you would have to be explicit:

int minLength = names.Min(y => y.Length);
return names.Where(x => x.Length == minLength);

Is my hypothesis correct, or is there something special about Linq or the C# specification that allows the compiler to look inside the lambda and optimize this call to Min()?


@spender is absolutely correct. Consider the following snippet:

List<string> names = new List<string>(new[] { "r", "abcde", "bcdef", "cdefg", "q" });
return names.Where(x => 
{
    bool b = (x.Length == names.Min(y => y.Length)); 
    names = new List<string>(new[] { "ab" }); 
    return b; 
});

This will return only "r", and not "q", because while the old reference to names is being iterated (foreach x), the call to Min after the first iteration is actually called with the new instance of names. But, a human looking at the query in the top of the question can say for certain nothing gets modified. So my question still stands: is the compiler smart enough to see this?

like image 248
lc. Avatar asked Jun 14 '14 11:06

lc.


2 Answers

wondered about compiler optimization of Linq predicates.

The C# compiler does not know how the BCL types are implemented. It could look at the assemblies that you reference but those can change at any time. The compiler cannot assume that the machine the compiled program will run on will have the same binaries. Therefore, th C# compiler cannot legally perform these optimizations because you could tell the difference.

The JIT is in a position to make such optimizations (it does not at the moment).

Now, we know the C# specification does not allow you to modify a collection while enumerating it. So I believe it is technically safe to assume the above call to Min() will always return the same value for every call.

The specification of C# knows nothing about libraries. It does not say this at all. Each implementation of IEnumerable can decide whether it wants to allows such behavior or not.

But, my hypothesis is the compiler truly has no way of knowing what the lambda inside the Enumerable.Min extension method returns.

Yes, it could do anything. At runtime the JIT could deduce such properties but it does not. Note, that deducing even basic facts is hard because there are things like reflection, runtime code generation and multi-threading.

Is my hypothesis correct, or is there something special about Linq or the C# specification that allows the compiler to look inside the lambda and optimize this call to Min()?

No. LINQ has library-only optimizations. LINQ to objects is executed exactly as you wrote it. Other LINQ providers do this differently.

If you wonder whether the JIT will perform some advanced optimization the answer is usually no as of .NET 4.5.

like image 108
usr Avatar answered Oct 06 '22 16:10

usr


C# compiler works in passes. Each pass takes some complex language feature and converts it to simpler. Quite often context is lost in this conversion. Lambda expressions are one of those steps. Each lambda is converted to class and this class is then instantiated and it's main method is passed to the delegate. And the compilation pass doesn't even look inside the lambda. So compiler that produces the IL code doesn't even know there are any lambdas and just sees bunch of classes. And those classes doesn't give him enough information to infer what you propose.

like image 28
Euphoric Avatar answered Oct 06 '22 18:10

Euphoric