Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UNION ALL vs OR condition in sql server query

Tags:

sql

sql-server

I have to select some rows based on a not exists condition on a table. If I use a union all as below, it gets executed in less than 1 second.

SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(

SELECT 1 FROM TABLE t
WHERE Data1 = t.Col1 AND Data2=t.Col2

UNION ALL

SELECT 1 FROM TABLE t
WHERE Data1 = t.Col2 AND Data2=t.Col1

)

but if I use an OR condition, it takes close to a minute as SQL server is doing a table lazy pool. Can someone explain it?

SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(

SELECT 1 FROM TABLE t
WHERE ( (Data1 = t.Col1 AND Data2=t.Col2) OR (Data1 = t.Col2 AND Data2=t.Col1))
)
like image 645
coder net Avatar asked Apr 12 '11 18:04

coder net


3 Answers

The issue is that you are specifying two conditions with OR that apply to separate tables in your query. Because of this, the nonclustered index seek has to return most or all of the rows in your big table because OR logic means they might also match the condition clause in the second table.

Look at the SQL execution plan in all three examples above, and notice the number of rows that come out of the nonclustered index seek from the big table. The ultimate result may only return 1,000 or fewer of the 800,000 rows in the table but the OR clause means that the contents of that table have to be cross-referenced with the conditional in the second table since OR means they may be needed for the final query output.

Depending on your execution plan, the index seek may pull out all 800,000 rows in big table because they may also match the conditions of the OR clause in the second table. The UNION ALL is two separate query against one table each, so the index seek only has to output the smaller result set that might match the condition for that query.

I hope this makes sense. I've run across the same situation while refactoring slow-running SQL statements.

Cheers,

Andre Ranieri

like image 80
user1472721 Avatar answered Oct 19 '22 16:10

user1472721


The query plan is also affected by the number of rows in your tables. How many rows are there in table t ?

You could also try:

SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
  SELECT 1 FROM TABLE t
  WHERE Data1 = t.Col1 AND Data2=t.Col2
)
AND NOT EXISTS 
(    
  SELECT 1 FROM TABLE t
  WHERE Data1 = t.Col2 AND Data2=t.Col1    
)

or (corrected for SQL-Server) this that will use the index:

WITH tt AS                               <---- a temp table with 2 rows
( SELECT Data1 AS Col1, Data2 AS Col2
  UNION
  SELECT Data2 AS Col1, Data1 AS Col2
)
SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
  SELECT 1
  FROM TABLE t
    JOIN tt                      
      ON tt.Col1 = t.Col1 AND tt.Col2=t.Col2
)
like image 3
ypercubeᵀᴹ Avatar answered Oct 19 '22 16:10

ypercubeᵀᴹ


The usage of the OR is probably causing the query optimizer to no longer use an index in the second query. Look at the explain for each query and that will tell you the answer.

like image 1
Wes Avatar answered Oct 19 '22 17:10

Wes