Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to select all values from one column not in another column

I need to return all values from colA that are not in colB from mytable. I am using:

SELECT DISTINCT(colA) FROM mytable WHERE colA NOT IN (SELECT colB FROM mytable)

It is working however the query is taking an excessively long time to complete.

Is there a more efficient way to do this?

like image 550
Flash Avatar asked Jan 05 '12 03:01

Flash


People also ask

How do I find values in one column not in another column in Excel?

You can check if the values in column A exist in column B using VLOOKUP. Select cell C2 by clicking on it. Insert the formula in “=IF(ISERROR(VLOOKUP(A2,$B$2:$B$1001,1,FALSE)),FALSE,TRUE)” the formula bar. Press Enter to assign the formula to C2.

How do I select data from one table is not in another table?

How to Select All Records from One Table That Do Not Exist in Another Table in SQL? We can get the records in one table that doesn't exist in another table by using NOT IN or NOT EXISTS with the subqueries including the other table in the subqueries.


2 Answers

You can use exists:

select distinct
    colA
from
    mytable m1
where
    not exists (select 1 from mytable m2 where m2.colB = m1.colA)

exists does a semi-join to quickly match the values. not in completes the entire result set and then does an or on it. exists is typically faster for values in tables.

like image 28
Eric Avatar answered Oct 19 '22 20:10

Eric


In standard SQL there are no parentheses in DISTINCT colA. DISTINCT is not a function.

SELECT DISTINCT colA
FROM   mytable
WHERE  colA NOT IN (SELECT DISTINCT colB FROM mytable);

Added DISTINCT to the sub-select as well. If you have many duplicates it could speed up the query.

A CTE might be faster, depending on your DBMS. I additionally demonstrate LEFT JOIN as alternative to exclude the values in valB, and an alternative way to get distinct values with GROUP BY:

WITH x AS (SELECT colB FROM mytable GROUP BY colB)
SELECT m.colA
FROM   mytable m
LEFT   JOIN x ON x.colB = m.colA
WHERE  x.colB IS NULL
GROUP  BY m.colA;

Or, simplified further, and with a plain subquery (probably fastest):

SELECT DISTINCT m.colA
FROM   mytable m
LEFT   JOIN mytable x ON x.colB = m.colA
WHERE  x.colB IS NULL;

There are basically 4 techniques to exclude rows with keys present in another (or the same) table:

  • Select rows which are not present in other table

The deciding factor for speed will be indexes. You need to have indexes on colA and colB for this query to be fast.

like image 73
Erwin Brandstetter Avatar answered Oct 19 '22 18:10

Erwin Brandstetter