Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Entity Framework generating a JOIN on SELECT

I am using Entity Framework in a C# application and I am using Lazy Loading. We noticed one query has an extremely high impact on our CPU, which merely calculates a sum. When debugging the query generated by Entity Framework it creates a INNER JOIN (SELECT ... which is not performant. When I manually change the query to a proper JOIN the query time goes from 1.3 sec to 0.03 sec.

Let me illustrate it with a simplified version of my code.

public decimal GetPortfolioValue(Guid portfolioId)
{
   var value = DbContext.Portfolios
        .Where( x => x.Id.Equals(portfolioId) )
        .SelectMany( p => p.Items
            .Where( i => i.Status == ItemStatusConstants.Subscribed 
                && _activeStatuses.Contains( i.Category.Status ) )
        )
        .Select( i => i.Amount )
        .DefaultIfEmpty(0)
        .Sum();

   return value;
}

This generates a query that selects the sum but does an inner join on a SELECT of two tables joined together. I created a pastebin here for the generated query not to pollute this question but a shortened version would be:

SELECT ...
FROM `portfolios` AS `Extent1`
INNER JOIN (SELECT 
               `Extent2`.*,
               `Extent3`.*
            FROM `items` AS `Extent2`
            INNER JOIN `categories` AS `Extent3` ON `Extent3`.`id` = 
`Extent2`.`category_id`) AS `Join1`
ON `Extent1`.`id` = `Join1`.`portfolio_id`
    AND ((`Join1`.`status` = @gp1)
    AND (`Join1`.`STATUS1` IN (@gp2, @gp3, @gp4, @gp5, @gp6)))
WHERE ...

The query I'd expect it to generate (and which takes 0.03 sec instead of 1.3 sec) would be something like

SELECT ...
FROM `portfolios` AS `Extent1`
INNER JOIN `items` AS `Extent2` ON `Extent2`.`portfolio_id` = `Extent1`.`id`
INNER JOIN `categories` AS `Extent3` ON `Extent3`.`id` = `Extent2`.`category_id`
    AND ((`Extent2`.`status` = @gp1)
    AND (`Extent3`.`status` IN (@gp2, @gp3, @gp4, @gp5, @gp6)))
WHERE ...

I suspect it's due to the .SelectMany but I don't see how I should rewrite the LINQ query to make it more efficient. As for the Entities, the linking properties are virtual and have a foreign key configured:

public class Portfolio
{
   public Guid Id { get; set; }
   public virtual ICollection<Item> Items { get; set; }
}

public class Item
{
   public Guid Id { get; set; }
   public Guid PortfolioId { get; set; }
   public Guid CategoryId { get; set; }
   public decimal Amount { get; set; }
   public string Status { get; set; }
   public virtual Portfolio Portfolio { get; set; }
   public virtual Category Category { get; set; }
}

public class Category
{
   public Guid Id { get; set; }
   public string Status { get; set; }
   public virtual ICollection<Item> Items { get; set; }
}

Any help would be greatly appreciated!

like image 450
Jules Avatar asked Sep 01 '25 03:09

Jules


1 Answers

As you dont need anything from Portfolio, just filter by PortfolioId, you could query directly PortfolioItems. Assuming your DbContext has a DbSet with all items in all portfolios, could be something like this:

var value = DbContext.PortfolioItems
                     .Where(i => i.PortfolioId == portfolioId && i.Status == ItemStatusConstants.Subscribed && _activeStatuses.Contains(i.Category.Status))
                     .Sum(i=>i.Amount);                 

I believe you don't need the DefaultIfEmpty nor the select if you use directly the appropiate Queryable.Sum overload.

EDITED: Tried two different LINQ queries without exposing a DbSet.

The first query is basically the same as yours:

var value2 = dbContext.Portfolios
    .Where(p => p.Id == portfolioId)
    .SelectMany(p => p.Items)
    .Where(i => i.Status == "A" && _activeStatuses.Contains(i.Category.Status))
    .Select(i=>i.Amount)
    .DefaultIfEmpty()
    .Sum();

Profiled the query in SQL Server (don't have MySql at hand) and produces an ugly sentence (parameters replaced and quotes unescaped for testing):

SELECT [GroupBy1].[a1] AS [C1] 
FROM   (SELECT Sum([Join2].[a1_0]) AS [A1] 
    FROM   (SELECT CASE 
                     WHEN ( [Project1].[c1] IS NULL ) THEN Cast( 
                     0 AS DECIMAL(18)) 
                     ELSE [Project1].[amount] 
                   END AS [A1_0] 
            FROM   (SELECT 1 AS X) AS [SingleRowTable1] 
                   LEFT OUTER JOIN 
                   (SELECT [Extent1].[amount] AS [Amount], 
                           Cast(1 AS TINYINT) AS [C1] 
                    FROM   [dbo].[items] AS [Extent1] 
                           INNER JOIN [dbo].[categories] AS 
                                      [Extent2] 
                                   ON [Extent1].[categoryid] = 
                                      [Extent2].[id] 
                    WHERE  ( N'A' = [Extent1].[status] ) 
                           AND ( [Extent1].[portfolioid] = 
                                 'E2CC0CC2-066F-45C9-9D48-543D92C4C92E' ) 
                           AND ( [Extent2].[status] IN ( N'A', N'B', N'C' ) 
                               ) 
                           AND ( [Extent2].[status] IS NOT NULL )) AS 
                   [Project1] 
                                ON 1 = 1) AS [Join2]) AS [GroupBy1] 

If we remove the "Select" and "DefaultIfEmpty" methods, and rewrite the query as this:

var value = dbContext.Portfolios
    .Where(p => p.Id == portfolioId)
    .SelectMany(p => p.Items)
    .Where(i => i.Status == "A" && _activeStatuses.Contains(i.Category.Status))
    .Sum(i => i.Amount);

The generated sentence is much cleaner:

SELECT [GroupBy1].[a1] AS [C1] 
FROM   (SELECT Sum([Extent1].[amount]) AS [A1] 
    FROM   [dbo].[items] AS [Extent1] 
           INNER JOIN [dbo].[categories] AS [Extent2] 
                   ON [Extent1].[categoryid] = [Extent2].[id] 
    WHERE  ( N'A' = [Extent1].[status] ) 
           AND ( [Extent1].[portfolioid] = 
                 'E2CC0CC2-066F-45C9-9D48-543D92C4C92E' ) 
           AND ( [Extent2].[status] IN ( N'A', N'B', N'C' ) ) 
           AND ( [Extent2].[status] IS NOT NULL )) AS [GroupBy1] 

Conclussion: We cannot rely on LINQ provider to create optimized queries. The linq query must be analyzed and optimized even before thinking in SQL sentence generated.

like image 148
Pepelui360 Avatar answered Sep 02 '25 15:09

Pepelui360