Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: Performance comparison for exclusion (Join vs Not in)

I am curious on the most efficient way to query exclusion on sql. E.g. There are 2 tables (tableA and tableB) which can be joined on 1 column (col1). I want to display the data of tableA for all the rows which col1 does not exist in tableB.

(So, in other words, tableB contains a subset of col1 of tableA. And I want to display tableA without the data that exists in tableB)

Let's say tableB has 100 rows while tableA is gigantic (more than 1M rows). I know 'Not in (not exists)' can be used but perhaps there are more efficient ways (less comp. time) to do it.? I don't maybe with outer joins?

Code snippets and comments are much appreciated.

like image 282
someone Avatar asked Jun 21 '10 09:06

someone


People also ask

Which is faster not in or left join?

The LEFT JOIN query is slower than the INNER JOIN query because it's doing more work. From the EXPLAIN output, it looks like MySQL is doing nested loop join.

Which is faster in SQL in or not in?

Your answer So IN is faster in some circumstances. The best way to know is to profile both on your database with your specific data to see which is faster. So, in this case, the method using OR is about 30% slower. Adding more terms makes the difference larger.

Which is faster not in or not EXISTS?

NOT IN vs NOT EXISTS performance in SQL Server Regarding performance aspects, SQL NOT EXISTS would be a better choice over SQL NOT IN. NOT EXISTS is significantly faster than NOT IN especially when the subquery result is very large.

Are inner joins more performant than left joins?

If you dont include the items of the left joined table, in the select statement, the left join will be faster than the same query with inner join. If you do include the left joined table in the select statement, the inner join with the same query was equal or faster than the left join.


2 Answers

Depends on the RDBMS. For Microsoft SQL Server NOT EXISTS is preferred to the OUTER JOIN as it can use the more efficient Anti-Semi join.

For Oracle Minus is apparently preferred to NOT EXISTS (where suitable)

You would need to look at the execution plans and decide.

like image 117
Martin Smith Avatar answered Nov 04 '22 19:11

Martin Smith


I prefer to use

Select a.Col1
From TableA a
Left Join TableB b on a.Col1 = b.Col1
Where b.Col1 Is Null

I believe this will be quicker as you are utilising the FK constraint (providing you have them of course)

Sample data:

create table #a
(
Col1 int
)
Create table #b
(
col1 int
)

insert into #a
Values (1)
insert into #a
Values (2)
insert into #a
Values (3)
insert into #a
Values (4)

insert into #b
Values (1)
insert into #b
Values (2)


Select a.Col1
From #a a 
Left Join #b b on a.col1 = b.Col1
Where b.Col1 is null
like image 23
codingbadger Avatar answered Nov 04 '22 19:11

codingbadger