Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

select statment performance degradation when using DISTINCT with parameters

Note for bounty - START:

PARAMETERS SNIFFING (that is the only "idea" that was reported in pre-bounty questions) is not the issue here, as you can read in the "update" section at the end of the question. The problem is really related to how sql server creates execution plans for a parametrized query when distinct is used. I uploaded a very simple database backup (it works with sql server 2008 R2) here (you must wait 20 seconds before downloading). Against this DB you can try to run the following queries:

-- PARAMETRIZED QUERY

declare @IS_ADMINISTRATOR int
declare @User_ID int
set @IS_ADMINISTRATOR = 1 -- 1 for administrator 0 for normal
set @User_ID = 50

SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  1 = @IS_ADMINISTRATOR OR  ROL.USER_ID = @USER_ID

-- NON PARAMETRIZED QUERY

SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!! 
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  1 = 1 OR  ROL.USER_ID = 50

Final note: I noticed DSTINCT is the problem, my goal is to achieve the same speed (or at least almost the same speed) in both queries.

Note for bounty - END:


Original question:

I noticed that there is an heavy difference in performance between

-- Case A
select distinct * from table where id > 1

compared to (this is the sql generated by my Delphi application)

-- Case B1
exec sp_executesql N'select distinct * from table where id > @P1',N'@P1 int',1

that is equivalent to

-- Case B2
declare @P1 int
set @P1 = 1
select distinct * from table where id > @P1

A performs much faster than B1 and B2. The performance becomes the same in case I remove DISTINCT.

May you comment on this?

Here i posted a trivial query, I noticed this on a query with 3 INNER JOIN. Anyway not a complex query.

Note: I was expecting to have THE EXACT SAME PERFORMANCE, in cases A and B1/B2.

So are there some caveats in using DISTINCT?

UPDATE:

I tried to disable parameter sniffing using DBCC TRACEON (4136, -1) (the flag to disable parameter sniffing) but nothing changes. So in this case the problem is NOT LINKED TO PARAMETERS SNIFFING. Any idea?

like image 466
LaBracca Avatar asked Oct 13 '22 19:10

LaBracca


2 Answers

The problem isn't that DISTINCT is causing a performance degradation with parameters, it's that the rest of the query isn't being optimized away in the parameterized query because the optimizer won't just optimize away all of the joins using 1=@IS_ADMINISTRATOR like it will with just 1=1. It won't optimize the joins away without distinct because it needs to return duplicates based on the result of the joins.

Why? Because the execution plan tossing out all of the joins would be invalid for any value other than @IS_ADMINISTRATOR = 1. It will never generate that plan regardless of whether you are caching plans or not.

This performs as well as the non parameterized query on my 2008 server:

-- PARAMETRIZED QUERY

declare @IS_ADMINISTRATOR int
declare @User_ID int
set @IS_ADMINISTRATOR = 1 -- 1 for administrator 0 for normal
set @User_ID = 50

IF 1 = @IS_ADMINISTRATOR 
BEGIN
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  1 = 1
END
ELSE 
BEGIN
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  ROL.USER_ID = @USER_ID
END

What's clear from the query plan I see running your example is that @IS_ADMINISTRATOR = 1 does not get optimized out the same as 1=1. In your non-parameterized example, the JOINS are completely optimized out, and it just returns every id in the DOCUMENTS table (very simple).

There are also different optimizations missing when @IS_ADMINISTRATOR <> 1. For instance, the LEFT OUTER JOINS are automatically changed to INNER JOINs without that OR clause, but they are left as-is with that or clause.

See also this answer: SQL LIKE % FOR INTEGERS for a dynamic SQL alternative.

Of course, this doesn't really explain the performance difference in your original question, since you don't have the OR in there. I assume that was an oversight.

like image 178
Kevin Stricker Avatar answered Oct 18 '22 02:10

Kevin Stricker


But also see "parameter sniffing" issue.

Why does a parameterized query produces vastly slower query plan vs non-parameterized query

https://groups.google.com/group/microsoft.public.sqlserver.programming/msg/1e4a2438bed08aca?hl=de

like image 26
Tim Avatar answered Oct 18 '22 01:10

Tim