Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query generated by EF takes too much time to execute

I have a very simple query which is generated by Entity-Framework, Sometimes when I try to run this query It almost takes more than 30 seconds to be executed, and I got time out Exception.

SELECT TOP (10) 
[Extent1].[LinkID] AS [LinkID], 
[Extent1].[Title] AS [Title], 
[Extent1].[Url] AS [Url], 
[Extent1].[Description] AS [Description], 
[Extent1].[SentDate] AS [SentDate], 
[Extent1].[VisitCount] AS [VisitCount], 
[Extent1].[RssSourceId] AS [RssSourceId], 
[Extent1].[ReviewStatus] AS [ReviewStatus], 
[Extent1].[UserAccountId] AS [UserAccountId], 
[Extent1].[CreationDate] AS [CreationDate]
FROM ( SELECT [Extent1].[LinkID] AS [LinkID], [Extent1].[Title] AS [Title], [Extent1].[Url] AS [Url], [Extent1].[Description] AS [Description], [Extent1].[SentDate] AS [SentDate], [Extent1].[VisitCount] AS [VisitCount], [Extent1].[RssSourceId] AS [RssSourceId], [Extent1].[ReviewStatus] AS [ReviewStatus], [Extent1].[UserAccountId] AS [UserAccountId], [Extent1].[CreationDate] AS [CreationDate], row_number() OVER (ORDER BY [Extent1].[SentDate] DESC) AS [row_number]
    FROM [dbo].[Links] AS [Extent1]
)  AS [Extent1]
WHERE [Extent1].[row_number] > 0
ORDER BY [Extent1].[SentDate] DESC

And the code which is generating the Query is:

public async Task<IQueryable<TEntity>> GetAsync(Expression<Func<TEntity, bool>> filter = null,
    Func<IQueryable<TEntity>, IOrderedQueryable<TEntity>> orderBy = null)
{
    return await Task.Run(() =>
    {
        IQueryable<TEntity> query = _dbSet;
        if (filter != null)
        {
            query = query.Where(filter);
        }

        if (orderBy != null)
        {
            query = orderBy(query);
        }

        return query;
    });
}

Note that when I remove inner Select statement and Where clause and change it to following, Query executes fine in a less than a second.

SELECT TOP (10) 
[Extent1].[LinkID] AS [LinkID], 
[Extent1].[Title] AS [Title], 
.
.
.
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC

Any advice will be helpful.

UPDATE:

Here is the usage of Above code:

var dbLinks = await _uow.LinkRespository.GetAsync(filter, orderBy);
var pagedLinks = new PagedList<Link>(dbLinks, pageNumber, PAGE_SIZE);
var vmLinks = Mapper.Map<IPagedList<LinkViewItemViewModel>>(pagedLinks);

And filter:

var result = await GetLinks(null, pageNo, a => a.OrderByDescending(x => x.SentDate));
like image 706
saber Avatar asked Mar 19 '15 07:03

saber


2 Answers

It never occurred to me that you simply didn't have an index. Lesson learnt - always check the basics before digging further.


If you don't need pagination, then the query can be simplified to

SELECT TOP (10) 
    [Extent1].[LinkID] AS [LinkID], 
    [Extent1].[Title] AS [Title], 
    ...
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC

and it runs fast, as you've verified.

Apparently, you do need the pagination, so let's see what we can do.

The reason why your current version is slow, because it scans the whole table first, calculates row number for each and every row and only then returns 10 rows. I was wrong here. SQL Server optimizer is pretty smart. The root of your problem is somewhere else. See my update below.


BTW, as other people mentioned, this pagination will work correctly only if SentDate column is unique. If it is not unique, you need to ORDER BY SentDate and another unique column like some ID to resolve ambiguity.

If you don't need ability to jump straight to particular page, but rather always start with page 1, then go to next page, next page and so on, then the proper efficient way to do such pagination is described in this excellent article: http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way The author uses PostgreSQL for illustration, but the technique works for MS SQL Server as well. It boils down to remembering the ID of the last row on the shown page and then using this ID in the WHERE clause with appropriate supporting index to retrieve the next page without scanning all previous rows.

SQL Server 2008 doesn't have a built-in support for pagination, so we'll have to use workaround. I will show one variant that allows to jump straight to a given page and would work fast for first pages, but would become slower and slower for further pages.

You will have these variables (PageSize, PageNumber) in your C# code. I put them here to illustrate the point.

DECLARE @VarPageSize int = 10; -- number of rows in each page
DECLARE @VarPageNumber int = 3; -- page numeration is zero-based

SELECT TOP (@VarPageSize)
    [Extent1].[LinkID] AS [LinkID]
    ,[Extent1].[Title] AS [Title]
    ,[Extent1].[Url] AS [Url]
    ,[Extent1].[Description] AS [Description]
    ,[Extent1].[SentDate] AS [SentDate]
    ,[Extent1].[VisitCount] AS [VisitCount]
    ,[Extent1].[RssSourceId] AS [RssSourceId]
    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
    ,[Extent1].[UserAccountId] AS [UserAccountId]
    ,[Extent1].[CreationDate] AS [CreationDate]
FROM
    (
        SELECT TOP((@VarPageNumber + 1) * @VarPageSize)
            [Extent1].[LinkID] AS [LinkID]
            ,[Extent1].[Title] AS [Title]
            ,[Extent1].[Url] AS [Url]
            ,[Extent1].[Description] AS [Description]
            ,[Extent1].[SentDate] AS [SentDate]
            ,[Extent1].[VisitCount] AS [VisitCount]
            ,[Extent1].[RssSourceId] AS [RssSourceId]
            ,[Extent1].[ReviewStatus] AS [ReviewStatus]
            ,[Extent1].[UserAccountId] AS [UserAccountId]
            ,[Extent1].[CreationDate] AS [CreationDate]
        FROM [dbo].[Links] AS [Extent1]
        ORDER BY [Extent1].[SentDate] DESC
    ) AS [Extent1]
ORDER BY [Extent1].[SentDate] ASC
;

The first page is rows 1 to 10, second page is 11 to 20 and so on. Let's see how this query works when we try to get the fourth page, i.e. rows 31 to 40. PageSize=10, PageNumber=3. In the inner query we select first 40 rows. Note, that we don't scan the whole table here, we scan only first 40 rows. We don't even need explicit ROW_NUMBER(). Then we need to select last 10 rows out of those found 40, so outer query selects TOP(10) with ORDER BY in the opposite direction. As is this will return rows 40 to 31 in reverse order. You can sort them back into correct order on the client, or add one more outer query, which simply sorts them again by SentDate DESC. Like this:

SELECT
    [Extent1].[LinkID] AS [LinkID]
    ,[Extent1].[Title] AS [Title]
    ,[Extent1].[Url] AS [Url]
    ,[Extent1].[Description] AS [Description]
    ,[Extent1].[SentDate] AS [SentDate]
    ,[Extent1].[VisitCount] AS [VisitCount]
    ,[Extent1].[RssSourceId] AS [RssSourceId]
    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
    ,[Extent1].[UserAccountId] AS [UserAccountId]
    ,[Extent1].[CreationDate] AS [CreationDate]
FROM
    (
        SELECT TOP (@VarPageSize)
            [Extent1].[LinkID] AS [LinkID]
            ,[Extent1].[Title] AS [Title]
            ,[Extent1].[Url] AS [Url]
            ,[Extent1].[Description] AS [Description]
            ,[Extent1].[SentDate] AS [SentDate]
            ,[Extent1].[VisitCount] AS [VisitCount]
            ,[Extent1].[RssSourceId] AS [RssSourceId]
            ,[Extent1].[ReviewStatus] AS [ReviewStatus]
            ,[Extent1].[UserAccountId] AS [UserAccountId]
            ,[Extent1].[CreationDate] AS [CreationDate]
        FROM
            (
                SELECT TOP((@VarPageNumber + 1) * @VarPageSize)
                    [Extent1].[LinkID] AS [LinkID]
                    ,[Extent1].[Title] AS [Title]
                    ,[Extent1].[Url] AS [Url]
                    ,[Extent1].[Description] AS [Description]
                    ,[Extent1].[SentDate] AS [SentDate]
                    ,[Extent1].[VisitCount] AS [VisitCount]
                    ,[Extent1].[RssSourceId] AS [RssSourceId]
                    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
                    ,[Extent1].[UserAccountId] AS [UserAccountId]
                    ,[Extent1].[CreationDate] AS [CreationDate]
                FROM [dbo].[Links] AS [Extent1]
                ORDER BY [Extent1].[SentDate] DESC
            ) AS [Extent1]
        ORDER BY [Extent1].[SentDate] ASC
    ) AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC

This query (as original query) would work always correctly only if SentDate is unique. If it is not unique, add unique column to the ORDER BY. For example, if LinkID is unique, then in the inner-most query use ORDER BY SentDate DESC, LinkID DESC. In the outer query reverse the order: ORDER BY SentDate ASC, LinkID ASC.

Obviously, if you want to jump to page 1000, then the inner query would have to read 10,000 rows, so the further you go, the slower it gets.

In any case, you need to have an index on SentDate (or SentDate, LinkID) to make it work. Without an index the query would scan the whole table again.

I'm not telling you here how to translate this query to EF, because I don't know. I never used EF. There may be a way. Also, apparently, you can just force it to use actual SQL, rather than trying to play with C# code.

Update

Execution plans comparison

In my database I have a table EventLogErrors with 29,477,859 rows and I compared on SQL Server 2008 the query with ROW_NUMBER that EF generates and what I suggested here with TOP. I tried to retrieve the fourth page 10 rows long. In both cases optimizer was smart enough to read only 40 rows, as you can see from the execution plans. I used a primary key column for ordering and pagination for this test. When I used another indexed column for pagination results were the same, i.e. both variants read only 40 rows. Needless to say, both variants returned results in a fraction of a second.

Variant with TOP

Variant with TOP

Variant with ROW_NUMBER

Variant with ROW_NUMBER

What it all means is that the root of your problem is somewhere else. You mentioned that your query runs slowly only sometimes and I didn't really pay attention to it originally. With such symptom I would do the following:

  • Check execution plan.
  • Check that you do have an index.
  • Check that the index is not heavily fragmented and statistics is not outdated.
  • The SQL Server has a feature called Auto-Parameterization. Also, it has a feature called Parameter Sniffing. Also, it has a feature called Execution plan caching. When all three features work together it may result in using a non-optimal execution plan. There is an excellent article by Erland Sommarskog explaining it in detail: http://www.sommarskog.se/query-plan-mysteries.html This article explains how to confirm that the problem is really in parameter sniffing by checking the cached execution plan and what can be done to fix the problem.
like image 83
Vladimir Baranov Avatar answered Oct 21 '22 11:10

Vladimir Baranov


I'm guessing the WHERE row_number > 0 will change over time as you ask for page 2, page 3, etc...

As such, I'm curious if it would help to create this index:

CREATE INDEX idx_links_SentDate_desc ON [dbo].[Links] ([SentDate] DESC)

In all honesty, IF it works, it's pretty much a band-aid and you'll probably will need to rebuild this index on a frequent basis as I'm guessing it will get fragmented over time...

UPDATE: check the comments! Turns out the DESC has no effect whatsoever and should be avoided if your data comes in low to high!

like image 41
deroby Avatar answered Oct 21 '22 12:10

deroby