Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need help understanding some execution plans

Assume a simple table defined as:

CREATE TABLE Table1
(
[ID] [bigint] NOT NULL IDENTITY(1, 1) NOT FOR REPLICATION,
[State] [tinyint] NOT NULL DEFAULT ((0))
)
ALTER TABLE Table1 ADD CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED  ([ID])
CREATE NONCLUSTERED INDEX [IX_NC_F_Media_StateNotDeleted] ON Table1 ([State]) WHERE ([State]<>(2))
CREATE NONCLUSTERED INDEX [IX_NC_F_Media_State] ON Table1 ([State]) WHERE ([State]=(0))

Values are inserted as follows:

250000 rows with State = 0
1000 rows with State = 5

And the following queries with their respective execution plans:

declare @mID int = 400000;
select State 
from Table1 
where (ID = @mID and State in (0, 1, 4, 5))
   or (@mID IS NULL and State in (0, 4))

enter image description here

Considering that ID is not null and therefore @mid IS NULL is mutually exclusive with ID = @mID, I rewrote the query as:

declare @mID int = 400000;
if @mID is null
begin
  select State 
  from Table1 
  where State in (0, 4)
end
else
begin
  select State 
  from Table1 
  where ID=@mID and State in (0, 1, 4, 5)
end

enter image description here

Questions:

  1. Why the difference in execution plan between these two cases? Why is the nonclustered index only used in the second case?
  2. Although the second case performs a seek vs. a scan for the case when @mID is null, does it really make any difference? The tooltip indicates that the performance hit is pretty much the same and I'm guessing this is due to the data being mostly State = 0 rows.
like image 315
Tudor Avatar asked Jun 07 '18 09:06

Tudor


People also ask

What is the execution plan?

An execution plan is generated when you execute any query which necessarily includes the query along with the plan. Running an execution plan for a particular query provides insights into the SQL Server query optimizer and query engine.

Why should you examine your execution plan?

If you can examine and understand execution plans, you can achieve better understanding of the database system and you will write better database code.


1 Answers

Normally, when optimiser generates a plan for the query this plan must be valid for any possible values of parameters. Normally, the plan is cached and it is not re-generated when you run the same query again, so it must remain valid (produce correct result) even if you re-run the query with a different value of the parameter.

So, the plan for the first query must have such shape that would work for any value of the @mID, including NULL and not-NULL.

Index IX_NC_F_Media_StateNotDeleted can be used to find values for both parts of the OR expression, but either optimiser is not smart enough to build a plan that does two seeks of this index and then unions the result, or optimiser decided that such plan would be more expensive.

So, either optimiser is not capable of seeing here that @mid IS NULL is mutually exclusive with ID = @mID, or it decides that alternative would be more expensive.

The second query with explicit IF makes optimiser's choice obvious.


This type of query is called "catch-all" or "sink" query. I recommend to read an excellent article Dynamic Search Conditions in T‑SQL by Erland Sommarskog.

In many cases it is appropriate to add an OPTION (RECOMPILE) to your first query, like this:

declare @mID int = 400000;
select State 
from Table1 
where (ID = @mID and State in (0, 1, 4, 5))
   or (@mID IS NULL and State in (0, 4))
OPTION(RECOMPILE);

SET @mID = NULL;
select State 
from Table1 
where (ID = @mID and State in (0, 1, 4, 5))
   or (@mID IS NULL and State in (0, 4))
OPTION(RECOMPILE);

Try to run these queries and examine their actual execution plans. You should see that the plan's shape changes according to the actual value of the parameter at the time of execution.

With OPTION(RECOMPILE) optimiser knows that the generated plan will not be cached as usual, so it takes the actual values of the parameters and inlines them as constants into the query. Once they are constants, here, the optimiser is able to see that NULL IS NULL is always true (or 400000 IS NULL is always false) and collapses/simplifies the logic expression. Besides, optimiser is able to pick the best appropriate indexes in each case.


Although the second case performs a seek vs. a scan for the case when @mID is null, does it really make any difference? The tooltip indicates that the performance hit is pretty much the same and I'm guessing this is due to the data being mostly State = 0 rows.

In this case seeking an index is pretty much the same as scanning the whole table. Their sizes (in pages) are the same. If your table had a lot of rows with State=2, then filtered index would be more efficient, since it would contain less pages than the main table.

like image 182
Vladimir Baranov Avatar answered Oct 21 '22 04:10

Vladimir Baranov