Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ and Entity Framework - Avoiding subqueries

I'm having really hard time tuning up one of my Entity Framework generated queries in my application. It is very basic query but for some reason EF uses multiple inner subqueries which seem to perform horribly in DB instead of using joins.

Here's my LINQ code:

Projects.Select(proj => new ProjectViewModel()
                {
                    Name = proj.Name,
                    Id = proj.Id,
                    Total = proj.Subvalue.Where(subv =>
                        subv.Created >= startDate
                        && subv.Created <= endDate
                        &&
                        (subv.StatusId == 1 ||
                         subv.StatusId == 2))
                        .Select(c => c.SubValueSum)
                        .DefaultIfEmpty()
                        .Sum()
                })
                .OrderByDescending(c => c.Total)
                .Take(10);

EF generates really complex query with multiple subqueries which has awful query performance like this:

SELECT TOP (10) 
[Project3].[Id] AS [Id], 
[Project3].[Name] AS [Name], 
[Project3].[C1] AS [C1]
FROM ( SELECT 
    [Project2].[Id] AS [Id], 
    [Project2].[Name] AS [Name], 
    [Project2].[C1] AS [C1]
    FROM ( SELECT 
        [Extent1].[Id] AS [Id], 
        [Extent1].[Name] AS [Name], 
        (SELECT 
            SUM([Join1].[A1]) AS [A1]
            FROM ( SELECT 
                CASE WHEN ([Project1].[C1] IS NULL) THEN cast(0 as decimal(18)) ELSE [Project1].[SubValueSum] END AS [A1]
                FROM   ( SELECT 1 AS X ) AS [SingleRowTable1]
                LEFT OUTER JOIN  (SELECT 
                    [Extent2].[SubValueSum] AS [SubValueSum], 
                    cast(1 as tinyint) AS [C1]
                    FROM [dbo].[Subvalue] AS [Extent2]
                    WHERE ([Extent1].[Id] = [Extent2].[Id]) AND ([Extent2].[Created] >= '2015-08-01') AND ([Extent2].[Created] <= '2015-10-01') AND ([Extent2].[StatusId] IN (1,2)) ) AS [Project1] ON 1 = 1
            )  AS [Join1]) AS [C1]
        FROM [dbo].[Project] AS [Extent1]
        WHERE ([Extent1].[ProjectCountryId] = 77) AND ([Extent1].[Active] = 1)
    )  AS [Project2]
)  AS [Project3]
ORDER BY [Project3].[C1] DESC;

The execution time of the query generated by EF is ~10 seconds. But when I write the query by hand like this:

select 
    TOP (10)
    Proj.Id,
    Proj.Name,
    SUM(Subv.SubValueSum) AS Total
from 
    SubValue as Subv
left join
    Project as Proj on Proj.Id = Subv.ProjectId
where
    Subv.Created > '2015-08-01' AND Subv.Created <= '2015-10-01' AND Subv.StatusId IN (1,2)
group by
    Proj.Id,
    Proj.Name
order by 
    Total DESC

The execution time is near instant; below 30ms.

The problem clearly lies in my ability to write good EF queries with LINQ but no matter what I try to do (using Linqpad for testing) I just can't write similar performant query with LINQ\EF as I can write by hand. I've trie querying the SubValue table and Project table but the endcome is mostly the same: multiple ineffective nested subqueries instead of a single join doing the work.

How can I write a query which imitates the hand written SQL shown above? How can I control the actual query generated by EF? And most importantly: how can I get Linq2SQL and Entity Framework to use Joins when I want to instead of nested subqueries.

like image 571
veturi Avatar asked Sep 23 '15 10:09

veturi


1 Answers

EF generates SQL from the LINQ expression you provide and you cannot expect this conversion to completely unravel the structure of whatever you put into the expression in order to optimize it. In your case you have created an expression tree that for each project will use a navigation property to sum some subvalues related to the project. This results in nested subqueries as you have discovered.

To improve on the generated SQL you need to avoid navigating from project to subvalue before doing all the operations on subvalue and you can do this by creating a join (which is also what you do in you hand crafted SQL):

var query = from proj in context.Project
            join s in context.SubValue.Where(s => s.Created >= startDate && s.Created <= endDate && (s.StatusId == 1 || s.StatusId == 2)) on proj.Id equals s.ProjectId into s2
            from subv in s2.DefaultIfEmpty()
            select new { proj, subv } into x
            group x by new { x.proj.Id, x.proj.Name } into g
            select new {
              g.Key.Id,
              g.Key.Name,
              Total = g.Select(y => y.subv.SubValueSum).Sum()
            } into y
            orderby y.Total descending
            select y;
var result = query.Take(10);

The basic idea is to join projects on subvalues restricted by a where clause. To perform a left join you need the DefaultIfEmpty() but you already know that.

The joined values (x) are then grouped and the summation of SubValueSum is performed in each group.

Finally, ordering and TOP(10) is applied.

The generated SQL still contains subqueries but I would expect it to more efficient compared to SQL generated by your query:

SELECT TOP (10)
    [Project1].[Id] AS [Id],
    [Project1].[Name] AS [Name],
    [Project1].[C1] AS [C1]
    FROM ( SELECT
        [GroupBy1].[A1] AS [C1],
        [GroupBy1].[K1] AS [Id],
        [GroupBy1].[K2] AS [Name]
        FROM ( SELECT
            [Extent1].[Id] AS [K1],
            [Extent1].[Name] AS [K2],
            SUM([Extent2].[SubValueSum]) AS [A1]
            FROM  [dbo].[Project] AS [Extent1]
            LEFT OUTER JOIN [dbo].[SubValue] AS [Extent2] ON ([Extent2].[Created] >= @p__linq__0) AND ([Extent2].[Created] <= @p__linq__1) AND ([Extent2].[StatusId] IN (1,2)) AND ([Extent1].[Id] = [Extent2].[ProjectId])
            GROUP BY [Extent1].[Id], [Extent1].[Name]
        )  AS [GroupBy1]
    )  AS [Project1]
    ORDER BY [Project1].[C1] DESC
like image 128
Martin Liversage Avatar answered Oct 06 '22 00:10

Martin Liversage