Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to SELECT rows WHERE the ID EXISTS IN a second table

I'm looking to select all records from one table where the ID exists in a second table.

The following two queries return the correct results:

Query 1:

SELECT *
FROM Table1 t1
WHERE EXISTS (SELECT 1 FROM Table2 t2 WHERE t1.ID = t2.ID)

Query 2:

SELECT *
FROM Table1 t1
WHERE t1.ID IN (SELECT t2.ID FROM Table2 t2)

Are one of these queries more efficient than the other? Should I use one over the other? Is there a third method that I didn't think of that is even more efficient?

like image 392
Tot Zam Avatar asked Jul 18 '16 19:07

Tot Zam


People also ask

How do you SELECT data from one table from another table?

One SQL code can have one or more than one nested query. Syntax: SELECT * FROM table_name WHERE column_name=( SELECT column_name FROM table_name);

How do you SELECT all records from one table that do not exist in another table?

How to Select All Records from One Table That Do Not Exist in Another Table in SQL? We can get the records in one table that doesn't exist in another table by using NOT IN or NOT EXISTS with the subqueries including the other table in the subqueries.


1 Answers

Summary:

IN and EXISTS performed similarly in all scenarios.. Below are the parameters used to validate..

Execution cost,Time:
Same for both and optimizer produced same plan.
Memory Grant:
Same for both queries
Cpu Time,Logical reads :
Exists seems to outperform IN by little bit margin in terms of CPU Time,though reads are same..

I ran each query 10 times each using below test data set..

  1. A very large subquery result set (100000 rows)
  2. Duplicate rows
  3. Null rows

For all the above scenarios, both IN and EXISTS performed in identical manner.

Some info about Performance V3 database used for testing. 20000 customers having 1000000 orders, so each customer is randomly duplicated (in a range of 10 to 100) in the orders table.

Execution cost,Time:
Below is screenshot of both queries running. Observe each query relative cost.

enter image description here

Memory Cost:
Memory grant for the two queries is also same..I Forced MDOP 1 so as not to spill them to TEMPDB..

enter image description here

CPU Time ,Reads:

For Exists:

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 1, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 3855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 469 ms,  elapsed time = 595 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

For IN:

(20000 row(s) affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 1, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 3855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 547 ms,  elapsed time = 669 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

In each case, the optimizer is smart enough to rearrange the queries.

I tend to use EXISTS only though (my opinion). One use case to use EXISTS is when you don't want to return a second table result set.

Update as per queries from Martin Smith:

I ran the below queries to find the most effective way to get rows from the first table for which a reference exists in the second table.

SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid   

SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid

SELECT *
FROM Customers C
WHERE EXISTS(SELECT 1 FROM Orders o WHERE o.custid = c.custid)

SELECT *
FROM Customers c
WHERE custid IN (SELECT custid FROM Orders)

All the above queries share the same cost with the exception of 2nd INNER JOIN, Plan being the same for the rest.

enter image description here

Memory Grant:
This query

SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid 

required memory grant of

enter image description here

This query

SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid 

required memory grant of ..

enter image description here

CPU Time,Reads:
For Query :

SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid   

(20000 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 48, logical reads 1344, physical reads 96, read-ahead reads 1248, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 5, logical reads 3929, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 5, logical reads 322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1453 ms,  elapsed time = 781 ms.

For Query:

SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid

(20000 row(s) affected)
Table 'Customers'. Scan count 5, logical reads 322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 5, logical reads 3929, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1499 ms,  elapsed time = 403 ms.
like image 199
TheGameiswar Avatar answered Sep 19 '22 14:09

TheGameiswar