I have a very simple query which is generated by Entity-Framework, Sometimes when I try to run this query It almost takes more than 30 seconds to be executed, and I got time out Exception
.
SELECT TOP (10)
[Extent1].[LinkID] AS [LinkID],
[Extent1].[Title] AS [Title],
[Extent1].[Url] AS [Url],
[Extent1].[Description] AS [Description],
[Extent1].[SentDate] AS [SentDate],
[Extent1].[VisitCount] AS [VisitCount],
[Extent1].[RssSourceId] AS [RssSourceId],
[Extent1].[ReviewStatus] AS [ReviewStatus],
[Extent1].[UserAccountId] AS [UserAccountId],
[Extent1].[CreationDate] AS [CreationDate]
FROM ( SELECT [Extent1].[LinkID] AS [LinkID], [Extent1].[Title] AS [Title], [Extent1].[Url] AS [Url], [Extent1].[Description] AS [Description], [Extent1].[SentDate] AS [SentDate], [Extent1].[VisitCount] AS [VisitCount], [Extent1].[RssSourceId] AS [RssSourceId], [Extent1].[ReviewStatus] AS [ReviewStatus], [Extent1].[UserAccountId] AS [UserAccountId], [Extent1].[CreationDate] AS [CreationDate], row_number() OVER (ORDER BY [Extent1].[SentDate] DESC) AS [row_number]
FROM [dbo].[Links] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 0
ORDER BY [Extent1].[SentDate] DESC
And the code which is generating the Query is:
public async Task<IQueryable<TEntity>> GetAsync(Expression<Func<TEntity, bool>> filter = null,
Func<IQueryable<TEntity>, IOrderedQueryable<TEntity>> orderBy = null)
{
return await Task.Run(() =>
{
IQueryable<TEntity> query = _dbSet;
if (filter != null)
{
query = query.Where(filter);
}
if (orderBy != null)
{
query = orderBy(query);
}
return query;
});
}
Note that when I remove inner Select
statement and Where
clause and change it to following, Query executes fine in a less than a second.
SELECT TOP (10)
[Extent1].[LinkID] AS [LinkID],
[Extent1].[Title] AS [Title],
.
.
.
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC
Any advice will be helpful.
UPDATE:
Here is the usage of Above code:
var dbLinks = await _uow.LinkRespository.GetAsync(filter, orderBy);
var pagedLinks = new PagedList<Link>(dbLinks, pageNumber, PAGE_SIZE);
var vmLinks = Mapper.Map<IPagedList<LinkViewItemViewModel>>(pagedLinks);
And filter:
var result = await GetLinks(null, pageNo, a => a.OrderByDescending(x => x.SentDate));
It never occurred to me that you simply didn't have an index. Lesson learnt - always check the basics before digging further.
If you don't need pagination, then the query can be simplified to
SELECT TOP (10)
[Extent1].[LinkID] AS [LinkID],
[Extent1].[Title] AS [Title],
...
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC
and it runs fast, as you've verified.
Apparently, you do need the pagination, so let's see what we can do.
The reason why your current version is slow, because it scans the whole table first, calculates row number for each and every row and only then returns 10 rows. I was wrong here. SQL Server optimizer is pretty smart. The root of your problem is somewhere else. See my update below.
BTW, as other people mentioned, this pagination will work correctly only if SentDate
column is unique. If it is not unique, you need to ORDER BY SentDate
and another unique column like some ID
to resolve ambiguity.
If you don't need ability to jump straight to particular page, but rather always start with page 1, then go to next page, next page and so on, then the proper efficient way to do such pagination is described in this excellent article: http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way
The author uses PostgreSQL for illustration, but the technique works for MS SQL Server as well. It boils down to remembering the ID
of the last row on the shown page and then using this ID
in the WHERE
clause with appropriate supporting index to retrieve the next page without scanning all previous rows.
SQL Server 2008 doesn't have a built-in support for pagination, so we'll have to use workaround. I will show one variant that allows to jump straight to a given page and would work fast for first pages, but would become slower and slower for further pages.
You will have these variables (PageSize
, PageNumber
) in your C# code. I put them here to illustrate the point.
DECLARE @VarPageSize int = 10; -- number of rows in each page
DECLARE @VarPageNumber int = 3; -- page numeration is zero-based
SELECT TOP (@VarPageSize)
[Extent1].[LinkID] AS [LinkID]
,[Extent1].[Title] AS [Title]
,[Extent1].[Url] AS [Url]
,[Extent1].[Description] AS [Description]
,[Extent1].[SentDate] AS [SentDate]
,[Extent1].[VisitCount] AS [VisitCount]
,[Extent1].[RssSourceId] AS [RssSourceId]
,[Extent1].[ReviewStatus] AS [ReviewStatus]
,[Extent1].[UserAccountId] AS [UserAccountId]
,[Extent1].[CreationDate] AS [CreationDate]
FROM
(
SELECT TOP((@VarPageNumber + 1) * @VarPageSize)
[Extent1].[LinkID] AS [LinkID]
,[Extent1].[Title] AS [Title]
,[Extent1].[Url] AS [Url]
,[Extent1].[Description] AS [Description]
,[Extent1].[SentDate] AS [SentDate]
,[Extent1].[VisitCount] AS [VisitCount]
,[Extent1].[RssSourceId] AS [RssSourceId]
,[Extent1].[ReviewStatus] AS [ReviewStatus]
,[Extent1].[UserAccountId] AS [UserAccountId]
,[Extent1].[CreationDate] AS [CreationDate]
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC
) AS [Extent1]
ORDER BY [Extent1].[SentDate] ASC
;
The first page is rows 1 to 10, second page is 11 to 20 and so on.
Let's see how this query works when we try to get the fourth page, i.e. rows 31 to 40. PageSize=10
, PageNumber=3
. In the inner query we select first 40 rows. Note, that we don't scan the whole table here, we scan only first 40 rows. We don't even need explicit ROW_NUMBER()
. Then we need to select last 10 rows out of those found 40, so outer query selects TOP(10)
with ORDER BY
in the opposite direction. As is this will return rows 40 to 31 in reverse order. You can sort them back into correct order on the client, or add one more outer query, which simply sorts them again by SentDate DESC
. Like this:
SELECT
[Extent1].[LinkID] AS [LinkID]
,[Extent1].[Title] AS [Title]
,[Extent1].[Url] AS [Url]
,[Extent1].[Description] AS [Description]
,[Extent1].[SentDate] AS [SentDate]
,[Extent1].[VisitCount] AS [VisitCount]
,[Extent1].[RssSourceId] AS [RssSourceId]
,[Extent1].[ReviewStatus] AS [ReviewStatus]
,[Extent1].[UserAccountId] AS [UserAccountId]
,[Extent1].[CreationDate] AS [CreationDate]
FROM
(
SELECT TOP (@VarPageSize)
[Extent1].[LinkID] AS [LinkID]
,[Extent1].[Title] AS [Title]
,[Extent1].[Url] AS [Url]
,[Extent1].[Description] AS [Description]
,[Extent1].[SentDate] AS [SentDate]
,[Extent1].[VisitCount] AS [VisitCount]
,[Extent1].[RssSourceId] AS [RssSourceId]
,[Extent1].[ReviewStatus] AS [ReviewStatus]
,[Extent1].[UserAccountId] AS [UserAccountId]
,[Extent1].[CreationDate] AS [CreationDate]
FROM
(
SELECT TOP((@VarPageNumber + 1) * @VarPageSize)
[Extent1].[LinkID] AS [LinkID]
,[Extent1].[Title] AS [Title]
,[Extent1].[Url] AS [Url]
,[Extent1].[Description] AS [Description]
,[Extent1].[SentDate] AS [SentDate]
,[Extent1].[VisitCount] AS [VisitCount]
,[Extent1].[RssSourceId] AS [RssSourceId]
,[Extent1].[ReviewStatus] AS [ReviewStatus]
,[Extent1].[UserAccountId] AS [UserAccountId]
,[Extent1].[CreationDate] AS [CreationDate]
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC
) AS [Extent1]
ORDER BY [Extent1].[SentDate] ASC
) AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC
This query (as original query) would work always correctly only if SentDate
is unique. If it is not unique, add unique column to the ORDER BY
. For example, if LinkID
is unique, then in the inner-most query use ORDER BY SentDate DESC, LinkID DESC
. In the outer query reverse the order: ORDER BY SentDate ASC, LinkID ASC
.
Obviously, if you want to jump to page 1000, then the inner query would have to read 10,000 rows, so the further you go, the slower it gets.
In any case, you need to have an index on SentDate
(or SentDate, LinkID
) to make it work. Without an index the query would scan the whole table again.
I'm not telling you here how to translate this query to EF, because I don't know. I never used EF. There may be a way. Also, apparently, you can just force it to use actual SQL, rather than trying to play with C# code.
Execution plans comparison
In my database I have a table EventLogErrors
with 29,477,859 rows and I compared on SQL Server 2008 the query with ROW_NUMBER
that EF generates and what I suggested here with TOP
. I tried to retrieve the fourth page 10 rows long. In both cases optimizer was smart enough to read only 40 rows, as you can see from the execution plans. I used a primary key column for ordering and pagination for this test. When I used another indexed column for pagination results were the same, i.e. both variants read only 40 rows. Needless to say, both variants returned results in a fraction of a second.
Variant with TOP
Variant with ROW_NUMBER
What it all means is that the root of your problem is somewhere else. You mentioned that your query runs slowly only sometimes and I didn't really pay attention to it originally. With such symptom I would do the following:
I'm guessing the WHERE row_number > 0
will change over time as you ask for page 2, page 3, etc...
As such, I'm curious if it would help to create this index:
CREATE INDEX idx_links_SentDate_desc ON [dbo].[Links] ([SentDate] DESC)
In all honesty, IF it works, it's pretty much a band-aid and you'll probably will need to rebuild this index on a frequent basis as I'm guessing it will get fragmented over time...
UPDATE: check the comments! Turns out the DESC
has no effect whatsoever and should be avoided if your data comes in low to high!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With