Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why assigning count of a query to a variable performed better than checking it directly?

Recently I had a performance tuning experience and I want to share it here and trying to understand why this improvement happened.

In one of my procs, I wanted to return a data set based on existence of some other records.

My query:

IF (SELECT COUNT(1) FROM ...) > 0
    SELECT …

This query was taking around 5 seconds.

I made a change and assigned the output of IF statement to a variable then check it.

DECLARE @cnt INT = 0
SELECT @cnt = COUNT(1) FROM …

IF @cnt > 0
    SELECT …

This one takes less than 1 sec to run.

Also I tried IF EXISTS as well but got the same result before the improvements (5 sec).

I am very keen to know why the compiler behaves that much different and if there is any particular answer for that.

Thank you

like image 455
sqluser Avatar asked Oct 19 '22 17:10

sqluser


1 Answers

There are two parts here.

1) SQL Server optimizer converts

IF (SELECT COUNT(1) FROM ...) > 0
    SELECT …

into

IF EXISTS(SELECT 1 FROM ...)
    SELECT …

I've seen that this was pointed out by Adam Machanic in his comment to the post by Andrew Kelly Exists Vs. Count(*) - The battle never ends:

It's interesting to note that in SQL Server 2005 if there is a index available to allow a seek, the COUNT(*) > 0 test will be optimized and behave the same as EXISTS.

Adam provided there a demo.


2) Sometimes EXISTS is worse than COUNT:

IF EXISTS taking longer than embedded select statement

Check existence with EXISTS outperform COUNT! … Not?

As Paul White wrote:

Using EXISTS introduces a row goal, where the optimizer produces an execution plan aimed at locating the first row quickly. In doing this, it assumes that the data is uniformly distributed. For example, if statistics show there are 100 expected matches in 100,000 rows, it will assume it will have to read only 1,000 rows to find the first match.

This will result in longer than expected execution times if this assumption turns out to be faulty. For example, if SQL Server chooses an access method (e.g. unordered scan) that happens to locate the first matching value very late on in the search, it could result in an almost complete scan. On the other hand, if a matching row happens to be found amongst the first few rows, performance will be very good. This is the fundamental risk with row goals - inconsistent performance.


If your data distribution is skewed or if you expect that in majority of cases the COUNT would be zero (i.e. you have to scan the whole table anyway to get the answer), then you should try to get the plan without a row goal (i.e. without EXISTS).

One obvious way as you found already is to save results of COUNT into a variable.

like image 176
Vladimir Baranov Avatar answered Nov 15 '22 05:11

Vladimir Baranov