I use Microsoft SQL Server 2008 (SP1, x64). I have two queries that do the same, or so I think, but they are have completely different query plans and performance.
Query 1:
SELECT c_pk
FROM table_c
WHERE c_b_id IN (SELECT b_id FROM table_b WHERE b_z = 1)
OR c_a_id IN (SELECT a_id FROM table_a WHERE a_z = 1)
Query 2:
SELECT c_pk
FROM table_c
LEFT JOIN (SELECT b_id FROM table_b WHERE b_z = 1) AS b ON c_b_id = b_id
LEFT JOIN (SELECT a_id FROM table_a WHERE a_z = 1) AS a ON c_a_id = a_id
WHERE b_id IS NOT NULL
OR a_id IS NOT NULL
Query 1 is fast as I would expect, whereas query 2 is very slow. The query plans look quite different.
I would like query 2 to be as fast as query 1. I have software that uses query 2, and I cannot change that into query 1. I can change the database.
Some questions:
All tables have (clustered) primary keys and proper indexes on all columns:
CREATE TABLE table_a (
a_pk int NOT NULL PRIMARY KEY,
a_id int NOT NULL UNIQUE,
a_z int
)
GO
CREATE INDEX IX_table_a_z ON table_a (a_z)
GO
CREATE TABLE table_b (
b_pk int NOT NULL PRIMARY KEY,
b_id int NOT NULL UNIQUE,
b_z int
)
GO
CREATE INDEX IX_table_b_z ON table_b (b_z)
GO
CREATE TABLE table_c (
c_pk int NOT NULL PRIMARY KEY,
c_a_id int,
c_b_id int
)
GO
CREATE INDEX IX_table_c_a_id ON table_c (c_a_id)
GO
CREATE INDEX IX_table_c_b_id ON table_c (c_b_id)
GO
The tables are not modified after filling initially. I'm the only one querying them. They contains millions of records (table_a: 5M, table_b: 4M, table_c: 12M), but using only 1% gives similar results.
Edit: I tried adding FOREIGN KEYs for c_a_id
and c_b_id
, but that only made query 1 slower...
I hope someone can have a look at the query plans and explain the difference.
The SQL AND, OR and NOT Operators The AND and OR operators are used to filter records based on more than one condition: The AND operator displays a record if all the conditions separated by AND are TRUE. The OR operator displays a record if any of the conditions separated by OR is TRUE.
The obvious difference is in what they are designed for: SQL is a query language used for manipulating data stored in a database. T-SQL is also a query language, but it's an extension of SQL that is primarily used in Microsoft SQL Server databases and software.
The SQL OR condition is used to test multiple conditions in a SELECT, INSERT, UPDATE, or DELETE statement.
Join are slower, let me say by design. First query uses a sub-query (cacheable) to filter records so it'll produce less data (and less accesses to each table).
Did you read these:
What I mean is that with IN the DB can do better optimizations like removing duplicates, stop at first match and similar (and these are from school memories so I'm sure it'll do much better). So I guess the question isn't why QP is different but how smart how deep optimizations can go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With