Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does IF (query) take over an hour, when query takes less than a second?

Tags:

sql

sql-server

I have the following SQL:

 IF EXISTS
 (
    SELECT
        1
    FROM
        SomeTable T1
    WHERE
        SomeField = 1
    AND SomeOtherField = 1
    AND NOT EXISTS(SELECT 1 FROM SomeOtherTable T2 WHERE T2.KeyField = T1.KeyField)
)
    RAISERROR ('Blech.', 16, 1)

The SomeTable table has around 200,000 rows, and the SomeOtherTable table has about the same.

If I execute the inner SQL (the SELECT), it executes in sub-second time, returning no rows. But, if I execute the entire script (IF...RAISERROR) then it takes well over an hour. Why?

Now, obviously, the execution plan is different - I can see that in Enterprise Manager - but again, why?

I could probably do something like SELECT @num = COUNT(*) WHERE ... and then IF @num > 0 RAISERROR but... I think that's missing the point somewhat. You can only code around a bug (and it sure looks like a bug to me) if you know that it exists.


EDIT:

I should mention that I already tried re-jigging the query into an OUTER JOIN as per @Bohemian's answer, but this made no difference to the execution time.


EDIT 2:

I've attached the query plan for the inner SELECT statement:

Query Plan - inner SELECT statement

... and the query plan for the whole IF...RAISERROR block:

Query Plan - whole IF statement

Obviously these show the real table/field names, but apart from that the query is exactly as shown above.

like image 383
Gary McGill Avatar asked Mar 26 '13 10:03

Gary McGill


People also ask

Why does the same query takes different amount of time to run?

Either you are getting different performance because the system is under a different sort of load, you are getting different performance because of data volume changes, or you are getting different performance because you are getting different query plans.

Why do queries run faster the second time?

When you run your query for the first time and the data is not in cache, the server read the data from disk. It is time-comsuming. The second time you execute the same query data is already in cache so it requires less time.

How do you make a query not faster?

You can use a left outer join, or a not exists clause. Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in one.

How long should a SQL query take to execute?

The query takes 20 to 500 ms (or sometimes more) depending on the system and the amount of data. The performance of the database or the database server has a significant influence on the speed.


2 Answers

The IF does not magically turn off optimizations or damage the plan. The optimizer just noticed that EXISTS only needs one row at most (like a TOP 1). This is called a "row goal" and it normally happens when you do paging. But also with EXISTS, IN, NOT IN and such things.

My guess: if you write TOP 1 to the original query you get the same (bad) plan.

The optimizer tries to be smart here and only produce the first row using much cheaper operations. Unfortunately, it misestimates cardinality. It guesses that the query will produce lots of rows although in reality it produces none. If it estimated correctly you'd just get a more efficient plan, or it would not do the transformation at all.

I suggest the following steps:

  1. fix the plan by reviewing indexes and statistics
  2. if this didn't help, change the query to IF (SELECT COUNT(*) FROM ...) > 0 which will give the original plan because the optimizer does not have a row goal.
like image 172
usr Avatar answered Oct 16 '22 18:10

usr


It's probably because the optimizer can figure out how to turn your query into a more efficient query, but somehow the IF prevents that. Only an EXPLAIN will tell you why the query is taking so long, but I can tell you how to make this whole thing more efficient... Indtead of using a correlated subquery, which is incredibly inefficient - you get "n" subqueries run for "n" rows in the main table - use a JOIN.

Try this:

IF EXISTS (
  SELECT 1
  FROM SomeTable T1
  LEFT JOIN SomeOtherTable T2 ON T2.KeyField = T1.KeyField
  WHERE SomeField = 1
  AND SomeOtherField = 1
  AND T2.KeyField IS NULL
) RAISERROR ('Blech.', 16, 1)

The "trick" here is to use s LEFT JOIN and filter out all joined rows by testing for a null in the WHERE clause, which is executed after the join is made.

like image 28
Bohemian Avatar answered Oct 16 '22 17:10

Bohemian