Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between these T-SQL queries using OR?

I use Microsoft SQL Server 2008 (SP1, x64). I have two queries that do the same, or so I think, but they are have completely different query plans and performance.

Query 1:

SELECT c_pk
FROM table_c
WHERE c_b_id IN (SELECT b_id FROM table_b WHERE b_z = 1)
  OR  c_a_id IN (SELECT a_id FROM table_a WHERE a_z = 1)

Query 2:

SELECT c_pk
FROM table_c
LEFT JOIN (SELECT b_id FROM table_b WHERE b_z = 1) AS b ON c_b_id = b_id
LEFT JOIN (SELECT a_id FROM table_a WHERE a_z = 1) AS a ON c_a_id = a_id
WHERE b_id IS NOT NULL
  OR  a_id IS NOT NULL

Query 1 is fast as I would expect, whereas query 2 is very slow. The query plans look quite different.

I would like query 2 to be as fast as query 1. I have software that uses query 2, and I cannot change that into query 1. I can change the database.

Some questions:

  • why are the query plans different?
  • can I "teach" SQL Server somehow that query 2 is equal to query 1?

All tables have (clustered) primary keys and proper indexes on all columns:

CREATE TABLE table_a (
  a_pk   int NOT NULL PRIMARY KEY,
  a_id   int NOT NULL UNIQUE,
  a_z    int
)
GO
CREATE INDEX IX_table_a_z ON table_a (a_z)
GO

CREATE TABLE table_b (
  b_pk   int NOT NULL PRIMARY KEY,
  b_id   int NOT NULL UNIQUE,
  b_z    int
)
GO
CREATE INDEX IX_table_b_z ON table_b (b_z)
GO

CREATE TABLE table_c (
  c_pk   int NOT NULL PRIMARY KEY,
  c_a_id int,
  c_b_id int
)
GO
CREATE INDEX IX_table_c_a_id ON table_c (c_a_id)
GO
CREATE INDEX IX_table_c_b_id ON table_c (c_b_id)
GO

The tables are not modified after filling initially. I'm the only one querying them. They contains millions of records (table_a: 5M, table_b: 4M, table_c: 12M), but using only 1% gives similar results.

Edit: I tried adding FOREIGN KEYs for c_a_id and c_b_id, but that only made query 1 slower...

I hope someone can have a look at the query plans and explain the difference.

like image 326
Michel de Ruiter Avatar asked Mar 13 '12 11:03

Michel de Ruiter


People also ask

What is the use of OR operator in SQL?

The SQL AND, OR and NOT Operators The AND and OR operators are used to filter records based on more than one condition: The AND operator displays a record if all the conditions separated by AND are TRUE. The OR operator displays a record if any of the conditions separated by OR is TRUE.

What is the difference between T-SQL?

The obvious difference is in what they are designed for: SQL is a​ query language used for manipulating data stored in a database. T-SQL is also a query language, but it's an extension of SQL that is primarily used in Microsoft SQL Server databases and software.

Can we use OR in SQL Select query?

The SQL OR condition is used to test multiple conditions in a SELECT, INSERT, UPDATE, or DELETE statement.


1 Answers

Join are slower, let me say by design. First query uses a sub-query (cacheable) to filter records so it'll produce less data (and less accesses to each table).

Did you read these:

  • http://www.sql-server-performance.com/2006/tuning-joins/
  • http://blogs.msdn.com/b/craigfr/archive/2006/12/04/semi-join-transformation.aspx

What I mean is that with IN the DB can do better optimizations like removing duplicates, stop at first match and similar (and these are from school memories so I'm sure it'll do much better). So I guess the question isn't why QP is different but how smart how deep optimizations can go.

like image 127
Adriano Repetti Avatar answered Oct 30 '22 10:10

Adriano Repetti