Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does "where" position in LINQ query matter when joining in-memory?

Situation: Say we are executing a LINQ query that joins two in-memory lists (so no DbSets or SQL-query generation involved) and this query also has a where clause. This where only filters on properties included in the original set (the from part of the query).

Question: Does the linq query interpreter optimize this query in that it first executes the where before it performs the join, regardless of whether I write the where before or after the join? – so it does not have to perform a join on elements that are not included later anyways.

Example: For example, I have a categories list I want to join with a products list. However, I am just interested in the category with ID 1. Does the linq interpreter internally perform the exact same operations regardless of whether I write:

from category in categories
join prod in products on category.ID equals prod.CategoryID
where category.ID == 1 // <------ below join
select new { Category = category.Name, Product = prod.Name };

or

from category in categories
where category.ID == 1 // <------ above join
join prod in products on category.ID equals prod.CategoryID
select new { Category = category.Name, Product = prod.Name };

Previous research: I already saw this question but the OP author stated that his/her question is only targeting non-in-memory cases with generated SQL. I am explicitly interested with LINQ executing a join on two lists in-memory.

Update: This is not a dublicate of "Order execution of chain linq query" question as the referenced question clearly refers to a dbset and my question explicitly addressed a non-db scenario. (Moreover, although similar, I am not asking about inclusions based on navigational properties here but about "joins".)

Update2: Although very similar, this is also not a dublicate of "Is order of the predicate important when using LINQ?" as I am asking explicitly about in-memory situations and I cannot see the referenced question explicitly addressing this case. Moreover, the question is a bit old and I am actually interested in linq in the context of .NET Core (which didn't exist in 2012), so I updated the tag of this question to reflect this second point.

Please note: With this question I am aiming at whether the linq query interpreter somehow optimizes this query in the background and am hoping to get a reference to a piece of documentation or source code that shows how this is done by linq. I am not interested in answers such as "it does not matter because the performance of both queries is roughly the same".

like image 797
B12Toaster Avatar asked Dec 18 '18 11:12

B12Toaster


People also ask

Can we use multiple where clause in LINQ query?

A single query expression may have multiple where clauses.

How does join work in LINQ?

In a LINQ query expression, join operations are performed on object collections. Object collections cannot be "joined" in exactly the same way as two relational tables. In LINQ, explicit join clauses are only required when two source sequences are not tied by any relationship.

What is where clause in LINQ?

Where Clause in Query Syntax: The where clause is used to filter the query according to the given condition. You can provide a condition to where clause using lambda expression or by using Func delegate type. Where clause supports query syntax in both C# and VB.Net languages.

How Use left and right join in LINQ?

In LINQ, LEFT JOIN or LEFT OUTER JOIN is used to return all the records or elements from the left side collection and matching the elements from the right side of the collection. In LINQ, to achieve the LEFT Join behavior, it is mandatory to use the "INTO" keyword and "DefaultfEmpty()" method.


2 Answers

The LINQ query syntax will be compiled to a method chain. For details, read e.g. in this question.

The first LINQ query will be compiled to the following method chain:

categories
    .Join(
        products,
        category => category.ID,
        prod => prod.CategoryID,
        (category, prod) => new { category, prod })
    .Where(t => t.category.ID == 1)
    .Select(t => new { Category = t.category.Name, Product = t.prod.Name });

The second one:

categories
    .Where(category => category.ID == 1)
    .Join(
        products,
        category => category.ID,
        prod => prod.CategoryID,
        (category, prod) => new { Category = category.Name, Product = prod.Name });

As you can see, the second query will cause less allocations (note only one anonymous type vs 2 in the first query, and note how many instances of those anonymous types will be created on performing the query).

Furthermore, it's clear that the first query will perform a join operation on lot more data than the second (already filtered) one.

There will be no additional query optimization in case of LINQ-to-objects queries.

So the second version is preferable.

like image 73
dymanoid Avatar answered Oct 18 '22 08:10

dymanoid


For in memory lists (IEnumerables), no optimization is applied and query execution is made in chained order for in-memory lists.

I also tried result by first casting it to IQueryable then apply filtering but apparently casting time is pretty high for this big table.

I made a quick test for this case.

Console.WriteLine($"List Row Count = {list.Count()}"); 
Console.WriteLine($"JoinList Row Count = {joinList.Count()}"); 

var watch = Stopwatch.StartNew();
var result = list.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
   .Where(t => t.inner.Prop3 == "Prop13")
   .Select(t => new { t.inner.Prop4, t.lst.Prop2}); 
result.Dump();
watch.Stop();

Console.WriteLine($"Result1 Elapsed = {watch.ElapsedTicks}");

watch.Restart();
var result2 = list
   .Where(t => t.Prop3 == "Prop13")
   .Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
   .Select(t => new { t.inner.Prop4, t.lst.Prop2});

result2.Dump();
watch.Stop();
Console.WriteLine($"Result2 Elapsed = {watch.ElapsedTicks}"); 

watch.Restart();
var result3 = list.AsQueryable().Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
   .Where(t => t.inner.Prop3 == "Prop13")
   .Select(t => new { t.inner.Prop4, t.lst.Prop2}); 
result3.Dump();
watch.Stop();
Console.WriteLine($"Result3 Elapsed = {watch.ElapsedTicks}"); 

Findings:

List Count = 100
JoinList Count = 10
Result1 Elapsed = 27
Result2 Elapsed = 17
Result3 Elapsed = 591

List Count = 1000
JoinList Count = 10
Result1 Elapsed = 20
Result2 Elapsed = 12
Result3 Elapsed = 586

List Count = 100000
JoinList Count = 10
Result1 Elapsed = 603
Result2 Elapsed = 19
Result3 Elapsed = 1277

List Count = 1000000
JoinList Count = 10
Result1 Elapsed = 1469
Result2 Elapsed = 88
Result3 Elapsed = 3219
like image 31
Derviş Kayımbaşıoğlu Avatar answered Oct 18 '22 10:10

Derviş Kayımbaşıoğlu