Note for bounty - START:
PARAMETERS SNIFFING (that is the only "idea" that was reported in pre-bounty questions) is not the issue here, as you can read in the "update" section at the end of the question. The problem is really related to how sql server creates execution plans for a parametrized query when distinct is used. I uploaded a very simple database backup (it works with sql server 2008 R2) here (you must wait 20 seconds before downloading). Against this DB you can try to run the following queries:
-- PARAMETRIZED QUERY
declare @IS_ADMINISTRATOR int
declare @User_ID int
set @IS_ADMINISTRATOR = 1 -- 1 for administrator 0 for normal
set @User_ID = 50
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
DOC.DOCUMENT_ID
FROM
DOCUMENTS DOC LEFT OUTER JOIN
FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)
WHERE
1 = @IS_ADMINISTRATOR OR ROL.USER_ID = @USER_ID
-- NON PARAMETRIZED QUERY
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
DOC.DOCUMENT_ID
FROM
DOCUMENTS DOC LEFT OUTER JOIN
FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)
WHERE
1 = 1 OR ROL.USER_ID = 50
Final note: I noticed DSTINCT is the problem, my goal is to achieve the same speed (or at least almost the same speed) in both queries.
Note for bounty - END:
Original question:
I noticed that there is an heavy difference in performance between
-- Case A
select distinct * from table where id > 1
compared to (this is the sql generated by my Delphi application)
-- Case B1
exec sp_executesql N'select distinct * from table where id > @P1',N'@P1 int',1
that is equivalent to
-- Case B2
declare @P1 int
set @P1 = 1
select distinct * from table where id > @P1
A performs much faster than B1 and B2. The performance becomes the same in case I remove DISTINCT.
May you comment on this?
Here i posted a trivial query, I noticed this on a query with 3 INNER JOIN. Anyway not a complex query.
Note: I was expecting to have THE EXACT SAME PERFORMANCE, in cases A and B1/B2.
So are there some caveats in using DISTINCT?
UPDATE:
I tried to disable parameter sniffing using DBCC TRACEON (4136, -1)
(the flag to disable parameter sniffing) but nothing changes. So in this case the problem is NOT LINKED TO PARAMETERS SNIFFING. Any idea?
The problem isn't that DISTINCT is causing a performance degradation with parameters, it's that the rest of the query isn't being optimized away in the parameterized query because the optimizer won't just optimize away all of the joins using 1=@IS_ADMINISTRATOR like it will with just 1=1. It won't optimize the joins away without distinct because it needs to return duplicates based on the result of the joins.
Why? Because the execution plan tossing out all of the joins would be invalid for any value other than @IS_ADMINISTRATOR = 1. It will never generate that plan regardless of whether you are caching plans or not.
This performs as well as the non parameterized query on my 2008 server:
-- PARAMETRIZED QUERY
declare @IS_ADMINISTRATOR int
declare @User_ID int
set @IS_ADMINISTRATOR = 1 -- 1 for administrator 0 for normal
set @User_ID = 50
IF 1 = @IS_ADMINISTRATOR
BEGIN
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
DOC.DOCUMENT_ID
FROM
DOCUMENTS DOC LEFT OUTER JOIN
FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)
WHERE
1 = 1
END
ELSE
BEGIN
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
DOC.DOCUMENT_ID
FROM
DOCUMENTS DOC LEFT OUTER JOIN
FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)
WHERE
ROL.USER_ID = @USER_ID
END
What's clear from the query plan I see running your example is that @IS_ADMINISTRATOR = 1
does not get optimized out the same as 1=1
. In your non-parameterized example, the JOINS are completely optimized out, and it just returns every id in the DOCUMENTS table (very simple).
There are also different optimizations missing when @IS_ADMINISTRATOR <> 1
. For instance, the LEFT OUTER JOIN
S are automatically changed to INNER JOIN
s without that OR
clause, but they are left as-is with that or clause.
See also this answer: SQL LIKE % FOR INTEGERS for a dynamic SQL alternative.
Of course, this doesn't really explain the performance difference in your original question, since you don't have the OR in there. I assume that was an oversight.
But also see "parameter sniffing" issue.
Why does a parameterized query produces vastly slower query plan vs non-parameterized query
https://groups.google.com/group/microsoft.public.sqlserver.programming/msg/1e4a2438bed08aca?hl=de
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With