I have a table named Workflow. It has 37M rows in it. There is a primary key on the ID column (int) plus an additional column. The ID column is the first column in the index.
If I execute the following query, the PK is not used (unless I use an index hint)
Select Distinct(SubID) From Workflow Where ID >= @LastSeenWorkflowID
If I execute this query instead, the PK is used
Select Distinct(SubID) From Workflow Where ID >= 786400000
I suspect the problem is with using the parameter value in the query (which I have to do). I really don't want to use an index hint. Is there a workaround for this?
Please post the execution plan(s), as well as the exact table definition, including all indexes.
When you use a variable the optimizer does no know what selectivity the query will have, the @LastSeenWorkflowID may filter out all but very last few rows in Workflow, or it may include them all. The generated plan has to work in both situations. There is a threshold at which the range seek over the clustered index is becoming more expensive than a full scan over a non-clustered index, simply because the clustered index is so much wider (it includes every column in the leaf levels) and thus has so much more pages to iterate over. The plan generated, which considers an unknown value for @LastSeenWorkflowID, is likely crossing that threshold in estimating the cost of the clustered index seek and as such it chooses the scan over the non-clustered index.
You could provide a narrow index that is aimed specifically at this query:
CREATE INDEX WorkflowSubId ON Workflow(ID, SubId);
or:
CREATE INDEX WorkflowSubId ON Workflow(ID) INCLUDE (SubId);
Such an index is too-good-to-pass for your query, no matter the value of @LastSeenWorkflowID.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With