Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two similar LINQ queries, completely different generated SQL

I'm running into a problem with the following pseudoquery:

var daily = from p in db.table1
            group p by new
            {
                key1,
                key2
            } into g
            join d in db.table2
            on new { p.key1, p.key2 } equals { d.key1, d.key2 }
            select new
            {
                col1 = g.Key.key1
                col2 = g.Sum(a => a.column2)
                col3 = d.column3
            };

It runs but the generated SQL statement that LINQ sends to SQL Server is absurd. The actual implementation follows a similar setup as above with 7 or so more columns that each have a .Sum() calculation. The generated SQL has somewhere around 10-11 nested SELECT statements with no INNER JOIN and, of course, takes forever to run.

I tested out another implementation of the query:

var daily = from p in
                (from p in db.table1
                 group p by new
                 {
                     key1,
                     key2
                 } into g
                 select new
                 {
                     col1 = g.Key.key1,
                     col2 = g.Sum(a => a.column2)
                 })
            join d in db.table2
            on new { p.key1, p.key2 } equals new { d.key1, d.key2 }
            select new
            {
                col1 = p.col1,
                col2 = p.col2,
                col3 = d.column3
            };

This version generates far more reasonable SQL with a single SUB-SELECT and an INNER JOIN statement (it also runs damn near instantly). The thing I hate about this is that the first LINQ query is, IMHO, far more straight-forward and concise whereas the second seems rather redundant since I end up having to define all the columns I want from table1 twice.

Why do these two similar queries perform so much differently on the server and why does query 2 end up being far more efficient even though it's code is far less expressive?

Is there a way I can rewrite the first query to be as efficient as the second?

like image 320
Kittoes0124 Avatar asked Dec 28 '12 00:12

Kittoes0124


1 Answers

LINQ 2 SQL has a problem with the following pattern:

from t in table
group t by key into g
from t in g //"ungroup" the grouping - this is causing a problem
select ...

I think your join is triggering that because it "ungroups" the grouping. Note that a LINQ join is a GroupJoin which is unrepresentable in SQL. Think about it: How would you translate my example query? You have to join table to a grouped version of table causing insane redundancy.

I have seen this problem a few times. You have found the correct work-around: Force a projection to prevent this pattern from occurring.

There is a slightly less awkward version:

var daily = from p in db.table1
            group p by new
            {
                key1,
                key2
            } into g
            select new
            {
                col1 = g.Key.key1,
                col2 = g.Sum(a => a.column2)
            } into p
            join d in db.table2 on new { p.key1, p.key2 } equals new { d.key1, d.key2 }
            select new
            {
                col1 = p.col1,
                col2 = p.col2,
                col3 = d.column3
            };

The nesting is removed by the lesser known select x into y syntax.

like image 173
usr Avatar answered Nov 15 '22 06:11

usr