Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Linq do joins differently

Tags:

OK this is a little moan but it's also a question. In Linq I can do a join like this:

from c in dc.Customers join o in dc.Orders on c.custid equals o.custid ...

All well and good and completely rememberable without having to go back and google it. However left joins are far more complicated for some reason:

from c in dc.Customers 
join o in dc.Orders on c.custid equals o.custid 
into temp from x in temp.DefaultIfEmpty() ...

So my question is why couldn't the designers of Linq make things simple (more sql like) with something like this:

from c in dc.Customers left join o in dc.Orders on c.custid equals o.custid ...

Cheers Lee

like image 390
user52110 Avatar asked Feb 11 '09 14:02

user52110


1 Answers

why couldn't the designers of Linq make things simple (more sql like)

They could have. But your definition of simple (as a sql programmer) is not the same as the OO programmer's definition of simple. Linq (in C#) is a query technology for OO programmers, first. An example of this is why select comes last. That's to fulfill scope rules in C# and intellisense support in the editor.

These programmers maybe don't get LEFT JOIN (and get really confused if you say LEFT OUTER JOIN - thinking there's some difference, like one inherits from the other).

What they do understand is GROUP JOIN, which behaves in a similar manner.

List<int> myInts = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
List<int> myOtherInts = new List<int>() { 1, 3, 5, 7, 9, 11, 13 };
//
var query = from i in myInts
    join j in myOtherInts on i equals j into g
    select new {key = i, myGroup = g};
//
foreach (var grouping in query)
{
  Console.WriteLine("--{0}", grouping.key);
  foreach (var x in grouping.myGroup)
    Console.WriteLine(x);
}

All that DefaultIfEmpty stuff does is unpack the group - flattening the results into row/column form - away from the OO programmer's natural heirarchical form. DefaultIfEmpty is not semantically necessary to get to the results.

Here's the same query in method form - which the compiler generates from the above and which I prefer:

var query = myInts.GroupJoin(
    myOtherInts,
    i => i,
    j => j,
    (i, g) => new { key = i, myGroup = g }
);

Could you state that in terms of his example?

This query gives you customers, with their orders as an attached collection. The orders collection might be empty. If you have 50 customers and 1000 orders, you will have 50 customers in the result.

from c in dc.Customers
join o in dc.Orders on c.custid equals o.custid into someOrders
select new CustomerWithOrders()
  {theCustomer = c, theOrders = someOrders};

This query gives you a CustomerOrder row. If a customer has 5 orders, the customer will appear 5 times, each time matched to a different order. If the customer has 0 orders, the customer will appear once matched to a null order. If you have 50 customers and 1000 orders, you will have 50-1049 rows after the join and the meaning of an element in the result is hard to define.

from c in dc.Customers
join o in dc.Orders on c.custid equals o.custid into temp
from x in temp.DefaultIfEmpty()
select new CustomerOrderHybrid() {theCustomer = c, theOrder = x}

If they implemented left join, it would require the result shape of the second example. Once I used the group join, which is better, I wouldn't implement the left join in one step either. Hierarchical shaping of query results is great.

like image 159
Amy B Avatar answered Oct 01 '22 22:10

Amy B