Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use a Filtered Index for querying "recently modified" rows

I am using SQL Server 2008-R2, but I'd be interested in a more general answer too ...

I have a table with hundreds of millions of rows, each with a "DateModified" field (datetime2(7))

Now I very frequently poll that table with a query like

select * from table where DateModified > @P1 

Here the parameter is always recent (like within the last few minutes) and there are probably only a few records matching that value.

It occurs to me that I am maintaining a big index on the whole table, when I will never use the index for many of those values ... so this sounds like a perfect use of a Filtered Index where I only index the rows that I would possible be querying against...

But in this case what could the filter look like? My only idea was to Filter on

where DateModified > [yesterday]

where [yesterday] is a date literal, but then I'd have to re-define the filter periodically or the advantage of the filter would diminish over time.

On a whim I tried ModifiedDate > DATEADD(d,-1,GETDATE()) but that gave a nondescript error ... wasn't sure how that would be possible.

Is there some other way to accomplish this?

Then finally, if there is a way to do this, should I expect the stats to be wildly wrong in my situation, and would that affect my query performance?

My concern about the stats comes from this article.

I'm trying to propagate changes from one system to another with some disconnected data ... if you'd like to suggest a completely alternate approach to polling "DateModified", I'd be happy to consider it.

like image 527
TCC Avatar asked Mar 23 '23 12:03

TCC


1 Answers

Had a similar requirement awhile back and found that functions aren't allowed in the filter.

What you can do, is script out the index and schedule it to run in a job during off-peak (maybe nightly) hours. This will also take care of the stats issue because they will be recreated every time the index is created.

Here's an example of what we wound up doing:

    CREATE TABLE FilteredTest (
    TestDate datetime
    );

Then just run this on a schedule to create with just the newest rows:

DECLARE @sql varchar(8000) = '

IF EXISTS (SELECT 1 FROM sys.indexes WHERE name = ''IX_FilteredTest_TestDate'')
    DROP INDEX IX_FilteredTest_TestDate ON FilteredTest;

CREATE NONCLUSTERED INDEX IX_FilteredTest_TestDate ON FilteredTest (
    TestDate
)
WHERE TestDate > ''' + CONVERT(varchar(25),DATEADD(d,-1,GETDATE()) ,121) + ''';';

EXEC (@sql);
like image 93
brian Avatar answered Apr 25 '23 18:04

brian