I've been working for the first time with the Entity Framework in .NET, and have been writing LINQ queries in order to get information from my model. I would like to program in good habits from the beginning, so I've been doing research on the best way to write these queries, and get their results. Unfortunately, in browsing Stack Exchange, I've seem to have come across two conflicting explanations in how deferred/immediate execution works with LINQ:
Demonstrated in question Slow foreach() on a LINQ query - ToList() boosts performance immensely - why is this? , the implication is that "ToList()" needs to be called in order to evaluate the query immediately, as the foreach is evaluating the query on the data source repeatedly, slowing down the operation considerably.
Another example is the question Foreaching through grouped linq results is incredibly slow, any tips? , where the accepted answer also implies that calling "ToList()" on the query will improve performance.
Demonstrated in question Does foreach execute the query only once? , the implication is that the foreach causes one enumeration to be established, and will not query the datasource each time.
Continued browsing of the site has turned up many questions where "repeated execution during a foreach loop" is the culprit of the performance concern, and plenty of other answers stating that a foreach will appropriately grab a single query from a datasource, which means that both explanations seem to have validity. If the "ToList()" hypothesis is incorrect (as most of the current answers as of 2013-06-05 1:51 PM EST seem to imply), where does this misconception come from? Is there one of these explanations that is accurate and one that isn't, or are there different circumstances that could cause a LINQ query to evaluate differently?
Edit: In addition to the accepted answer below, I've turned up the following question over on Programmers that very much helped my understanding of query execution, particularly the the pitfalls that could result in multiple datasource hits during a loop, which I think will be helpful for others interested in this question: https://softwareengineering.stackexchange.com/questions/178218/for-vs-foreach-vs-linq
LINQ syntax is typically less efficient than a foreach loop. It's good to be aware of any performance tradeoff that might occur when you use LINQ to improve the readability of your code.
Use LINQ because you want shorter better readable and maintainable code.
LINQ queries are always executed when the query variable is iterated over, not when the query variable is created. This is called deferred execution. You can also force a query to execute immediately, which is useful for caching query results. This is described later in this topic.
Yes, it's slower.
The code above will execute the Linq query multiple times. Not because of the foreach, but because the foreach is inside another loop, so the foreach itself is being executed multiple times.
Continued browsing of the site has turned up many questions where "repeated execution during a foreach loop" is the culprit of the performance concern, and plenty of other answers stating that a foreach will appropriately grab a single query from a datasource, which means that both explanations seem to have validity.
In general LINQ uses deferred execution. If you use methods like First () and FirstOrDefault () the query is executed immediately. When you do something like;
LINQ foreach loop is very better in the quick looping process in collections. Foreach loop makes it easy to loop through the collection of items. When retrieving each element in a collection, the LINQ Foreach variable supports well-situated access. It supports all the approaches to retrieve the elements easily; it uses the LINQ extension methods.
In general LINQ uses deferred execution. If you use methods like First()
and FirstOrDefault()
the query is executed immediately. When you do something like;
foreach(string s in MyObjects.Select(x => x.AStringProp))
The results are retrieved in a streaming manner, meaning one by one. Each time the iterator calls MoveNext
the projection is applied to the next object. If you were to have a Where
it would first apply the filter, then the projection.
If you do something like;
List<string> names = People.Select(x => x.Name).ToList(); foreach (string name in names)
Then I believe this is a wasteful operation. ToList()
will force the query to be executed, enumerating the People
list and applying the x => x.Name
projection. Afterwards you will enumerate the list again. So unless you have a good reason to have the data in a list (rather than IEnumerale) you're just wasting CPU cycles.
Generally speaking using a LINQ query on the collection you're enumerating with a foreach will not have worse performance than any other similar and practical options.
Also it's worth noting that people implementing LINQ providers are encouraged to make the common methods work as they do in the Microsoft provided providers but they're not required to. If I were to go write a LINQ to HTML or LINQ to My Proprietary Data Format provider there would be no guarantee that it behaves in this manner. Perhaps the nature of the data would make immediate execution the only practical option.
Also, final edit; if you're interested in this Jon Skeet's C# In Depth is very informative and a great read. My answer summarizes a few pages of the book (hopefully with reasonable accuracy) but if you want more details on how LINQ works under the covers, it's a good place to look.
try this on LinqPad
void Main() { var testList = Enumerable.Range(1,10); var query = testList.Where(x => { Console.WriteLine(string.Format("Doing where on {0}", x)); return x % 2 == 0; }); Console.WriteLine("First foreach starting"); foreach(var i in query) { Console.WriteLine(string.Format("Foreached where on {0}", i)); } Console.WriteLine("First foreach ending"); Console.WriteLine("Second foreach starting"); foreach(var i in query) { Console.WriteLine(string.Format("Foreached where on {0} for the second time.", i)); } Console.WriteLine("Second foreach ending"); }
Each time the where delegate is being run we shall see a console output, hence we can see the Linq query being run each time. Now by looking at the console output we see the second foreach loop still causes the "Doing where on" to print, thus showing that the second usage of foreach does in fact cause the where clause to run again...potentially causing a slow down.
First foreach starting Doing where on 1 Doing where on 2 Foreached where on 2 Doing where on 3 Doing where on 4 Foreached where on 4 Doing where on 5 Doing where on 6 Foreached where on 6 Doing where on 7 Doing where on 8 Foreached where on 8 Doing where on 9 Doing where on 10 Foreached where on 10 First foreach ending Second foreach starting Doing where on 1 Doing where on 2 Foreached where on 2 for the second time. Doing where on 3 Doing where on 4 Foreached where on 4 for the second time. Doing where on 5 Doing where on 6 Foreached where on 6 for the second time. Doing where on 7 Doing where on 8 Foreached where on 8 for the second time. Doing where on 9 Doing where on 10 Foreached where on 10 for the second time. Second foreach ending
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With