Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity Framework generates inefficient SQL for paged query

I have a simple paged linq query against one entity:

var data = (from t in ctx.ObjectContext.Widgets
           where t.CampaignId == campaignId && 
                 t.CalendarEventId == calendarEventId
                 (t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
           select t);

data = data.OrderBy(t => t.Id);

if (page > 0)
{
    data = data.Skip(rows * (page - 1)).Take(rows);
}

var l = data.ToList(); 

I expected it to generate SQL similar to:

select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id

When I run the above query in SSMS, it returns quickly (had to rebuild my indexes first).

However, the generated SQL is different. It contains a nested query as shown below:

SELECT TOP (50) 
[Project1].[Id] AS [Id], 
[Project1].[CampaignId] AS [CampaignId]
<redacted>
FROM ( SELECT [Project1].[Id] AS [Id], 
[Project1].[CampaignId] AS [CampaignId], 
<redacted>, 
row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
    FROM ( SELECT 
        [Extent1].[Id] AS [Id], 
        [Extent1].[CampaignId] AS [CampaignId], 
        <redacted>
        FROM [dbo].[Widgets] AS [Extent1]
        WHERE ([Extent1].[CampaignId] = @p__linq__0) AND ([Extent1].[CalendarEventId] = @p__linq__1) AND ([Extent1].[RecurringEventId] = @p__linq__2 OR [Extent1].[RecurringEventId] IS NULL)
    )  AS [Project1]
)  AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[Id] ASC

The Widgets table is enormous and the inner query returns 100000s of records, causing a timeout.

Is there anything I can do to change the generation? Anything I am doing wrong?

UPDATE

I finally managed to refactor my code to return the results relatively quickly:

var data = (from t in ctx.ObjectContext.Widgets
           where t.CampaignId == campaignId && 
                 t.CalendarEventId == calendarEventId
                 (t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
           select t)).AsEnumerable().Select((item, index) => new { Index = index, Item = item });

            data = data.OrderBy(t => t.Index);

            if (page > 0)
            {
                data = data.Where(t => t.Index >= (rows * (page - 1)));
            }

            data = data.Take(rows);

Note, the page > 0 logic is simply used to prevent an invalid parameter being used; it does no optimization. In fact page > 1 , while valid, does not provide any noticeable optimization for the 1st page; since the Where is not a slow operation.

like image 238
Kev Avatar asked Jul 21 '15 21:07

Kev


People also ask

Which technique improves the performance mostly in Entity Framework?

The AsNoTracking method tells Entity Framework to stop that additional work and so, it can improve the performance of your application. So, in theory, a query with AsNoTracking should perform better than without.

How does Entity Framework affect the connection with the database?

Because an open connection to the database consumes a valuable resource, the Entity Framework opens and closes the database connection only as needed. You can also explicitly open the connection. For more information, see Managing Connections and Transactions. Once in each application domain.

Is Dapper faster than Entity Framework Core?

Dapper is literally much faster than Entity Framework Core considering the fact that there are no bells and whistles in Dapper. It is a straight forward Micro ORM that has minimal features as well. It is always up to the developer to choose between these 2 Awesome Data Access Technologies.


1 Answers

Prior SQL Server 2012, the generated SQL code is the best way to perform pagging. Yes, it is awfull and very inefficient but is the best you can do even writing your own SQL scritp by hand. There are tons of digital ink about this in the net. Just google it.

In the firt page, this can be optimized not doing Skip and just Take but in any other page you are f***** up.

A workarround could be to generate your own row_number in persistence (an auto-identity could work) and just do where(widget.number > (page*rows) ).Take(rows) in code. If there is a good index in your widget.number the query should be very fast. But, this breaks the dynamic orderBy.

However, I can see in your code that you are ordering by widget.id always; so, if dynamic orderBy is not essential, this could be a valid workaround.

Will you take your own medicine?

could you ask me.

No, I will not. The best way to deal with this is having a persistence read-model in wich you can even have one table per widget orderBy field with its own widget.number. The problem is that modeling a system with a persistence read-model just for this issue is too crazy. Having a read-model is part of the overall design of your system and requires taking it in account from the very beginning of the design and development of a system.

like image 75
jlvaquero Avatar answered Sep 24 '22 05:09

jlvaquero