I have the following SQL:
IF EXISTS
(
SELECT
1
FROM
SomeTable T1
WHERE
SomeField = 1
AND SomeOtherField = 1
AND NOT EXISTS(SELECT 1 FROM SomeOtherTable T2 WHERE T2.KeyField = T1.KeyField)
)
RAISERROR ('Blech.', 16, 1)
The SomeTable
table has around 200,000 rows, and the SomeOtherTable
table has about the same.
If I execute the inner SQL (the SELECT
), it executes in sub-second time, returning no rows. But, if I execute the entire script (IF...RAISERROR
) then it takes well over an hour. Why?
Now, obviously, the execution plan is different - I can see that in Enterprise Manager - but again, why?
I could probably do something like SELECT @num = COUNT(*) WHERE
... and then IF @num > 0 RAISERROR
but... I think that's missing the point somewhat. You can only code around a bug (and it sure looks like a bug to me) if you know that it exists.
EDIT:
I should mention that I already tried re-jigging the query into an OUTER JOIN as per @Bohemian's answer, but this made no difference to the execution time.
EDIT 2:
I've attached the query plan for the inner SELECT
statement:
... and the query plan for the whole IF...RAISERROR
block:
Obviously these show the real table/field names, but apart from that the query is exactly as shown above.
Either you are getting different performance because the system is under a different sort of load, you are getting different performance because of data volume changes, or you are getting different performance because you are getting different query plans.
When you run your query for the first time and the data is not in cache, the server read the data from disk. It is time-comsuming. The second time you execute the same query data is already in cache so it requires less time.
You can use a left outer join, or a not exists clause. Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in one.
The query takes 20 to 500 ms (or sometimes more) depending on the system and the amount of data. The performance of the database or the database server has a significant influence on the speed.
The IF
does not magically turn off optimizations or damage the plan. The optimizer just noticed that EXISTS
only needs one row at most (like a TOP 1
). This is called a "row goal" and it normally happens when you do paging. But also with EXISTS
, IN
, NOT IN
and such things.
My guess: if you write TOP 1
to the original query you get the same (bad) plan.
The optimizer tries to be smart here and only produce the first row using much cheaper operations. Unfortunately, it misestimates cardinality. It guesses that the query will produce lots of rows although in reality it produces none. If it estimated correctly you'd just get a more efficient plan, or it would not do the transformation at all.
I suggest the following steps:
IF (SELECT COUNT(*) FROM ...) > 0
which will give the original plan because the optimizer does not have a row goal.It's probably because the optimizer can figure out how to turn your query into a more efficient query, but somehow the IF prevents that. Only an EXPLAIN will tell you why the query is taking so long, but I can tell you how to make this whole thing more efficient... Indtead of using a correlated subquery, which is incredibly inefficient - you get "n" subqueries run for "n" rows in the main table - use a JOIN.
Try this:
IF EXISTS (
SELECT 1
FROM SomeTable T1
LEFT JOIN SomeOtherTable T2 ON T2.KeyField = T1.KeyField
WHERE SomeField = 1
AND SomeOtherField = 1
AND T2.KeyField IS NULL
) RAISERROR ('Blech.', 16, 1)
The "trick" here is to use s LEFT JOIN and filter out all joined rows by testing for a null in the WHERE clause, which is executed after the join is made.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With