Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL inner join vs subquery

I am working on following queries:

Query 1: SELECT * From TabA INNER JOIN TabB on TabA.Id=TabB.Id
Query 2: SELECT * From TabA WHERE Id in (SELECT Id FROM TabB)
Query 3: SELECT TabA.* From TabA INNER JOIN TabB on TabA.Id=TabB.Id

I investigate these queries with SQL Server profiler and found some interesting facts.

  • Query 1 takes 2.312 Seconds
  • Query 2 takes 0.811 Seconds
  • Query 3 takes 0.944 Seconds

TabA 48716 rows

TabB 62719 rows

Basically what I am asking is why Query 1 is taking long time, not Query 3. I already know that 'sub query' is slower than inner join but here Query 2 is fastest; why?

like image 916
Mohsin JK Avatar asked May 02 '13 14:05

Mohsin JK


1 Answers

If I had to guess I would say it's because query 1 is pulling the data from both tables. Queries 2 and 3 (aprox the same time) are only pulling data for TabA.

One way you could check this is by running the following:

SET STATISTICS TIME ON
SET STATISTICS IO ON

When I ran

SELECT * FROM sys.objects

I saw the following results.

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 104 ms.

(242 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syssingleobjrefs'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syspalnames'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 866 ms.

You can take a look at # of scans, logical reads and physical reads for each query. Physical reads of course take much longer and represent reading from the disk into the cache. If all of your reads are logical reads then your table is completely in cache.

I would be willing to bet if you look you will see a lot more logical reads on TabB on query 1 than on 2 and 3.

EDIT:

Just out of curiosity I did some tests and blogged the results here.

like image 135
Kenneth Fisher Avatar answered Sep 30 '22 04:09

Kenneth Fisher