I have to select some rows based on a not exists condition on a table. If I use a union all as below, it gets executed in less than 1 second.
SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
SELECT 1 FROM TABLE t
WHERE Data1 = t.Col1 AND Data2=t.Col2
UNION ALL
SELECT 1 FROM TABLE t
WHERE Data1 = t.Col2 AND Data2=t.Col1
)
but if I use an OR condition, it takes close to a minute as SQL server is doing a table lazy pool. Can someone explain it?
SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
SELECT 1 FROM TABLE t
WHERE ( (Data1 = t.Col1 AND Data2=t.Col2) OR (Data1 = t.Col2 AND Data2=t.Col1))
)
The issue is that you are specifying two conditions with OR that apply to separate tables in your query. Because of this, the nonclustered index seek has to return most or all of the rows in your big table because OR logic means they might also match the condition clause in the second table.
Look at the SQL execution plan in all three examples above, and notice the number of rows that come out of the nonclustered index seek from the big table. The ultimate result may only return 1,000 or fewer of the 800,000 rows in the table but the OR clause means that the contents of that table have to be cross-referenced with the conditional in the second table since OR means they may be needed for the final query output.
Depending on your execution plan, the index seek may pull out all 800,000 rows in big table because they may also match the conditions of the OR clause in the second table. The UNION ALL is two separate query against one table each, so the index seek only has to output the smaller result set that might match the condition for that query.
I hope this makes sense. I've run across the same situation while refactoring slow-running SQL statements.
Cheers,
Andre Ranieri
The query plan is also affected by the number of rows in your tables. How many rows are there in table t
?
You could also try:
SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
SELECT 1 FROM TABLE t
WHERE Data1 = t.Col1 AND Data2=t.Col2
)
AND NOT EXISTS
(
SELECT 1 FROM TABLE t
WHERE Data1 = t.Col2 AND Data2=t.Col1
)
or (corrected for SQL-Server) this that will use the index:
WITH tt AS <---- a temp table with 2 rows
( SELECT Data1 AS Col1, Data2 AS Col2
UNION
SELECT Data2 AS Col1, Data1 AS Col2
)
SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
SELECT 1
FROM TABLE t
JOIN tt
ON tt.Col1 = t.Col1 AND tt.Col2=t.Col2
)
The usage of the OR
is probably causing the query optimizer to no longer use an index in the second query. Look at the explain for each query and that will tell you the answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With