Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I perform a SQL 'NOT IN' query faster?

I have a table (EMAIL) of email addresses:

EmailAddress
------------
[email protected]
[email protected]
[email protected]
[email protected]

and a table (BLACKLIST) of blacklisted email addresses:

EmailAddress
------------
[email protected]
[email protected]

and I want to select those email addresses that are in the EMAIL table but NOT in the BLACKLIST table. I'm doing:

SELECT EmailAddress
FROM EMAIL
WHERE EmailAddress NOT IN
   (
      SELECT EmailAddress
      FROM BLACKLIST
   )

but when the row counts get very high the performance is terrible.

How can I better do this? (Assume generic SQL if possible. If not, assume T-SQL.)

like image 234
Howiecamp Avatar asked Feb 10 '12 16:02

Howiecamp


2 Answers

You can use a left outer join, or a not exists clause.

Left outer join:

select E.EmailAddress
  from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
 where B.EmailAddress is null;

Not Exists:

select E.EmailAddress
  from EMAIL E where not exists
         (select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)

Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in one.

As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress), that should help speed up the execution of your query.

like image 145
Pablo Santa Cruz Avatar answered Oct 14 '22 13:10

Pablo Santa Cruz


NOT IN differs from NOT EXISTS if the blacklist allow null value as EmailAddress. If there is a single null value the result of the query will always return zero rows because NOT IN (null) is unknown / false for every value. The query plans therefore differs slighyly but I don't think there would be any serious performance impact.

A suggestion is to create a new table called VALIDEMAIL, add a trigger to BLACKLIST that removes addresses from VALIDEMAIL when rows are inserted and add to VALIDEMAIL when removed from BLACKLIST. Then replace EMAIL with a view that is a union of both VALIDEMAIL and BLACKLIST.

like image 24
Daniel Gustafsson Avatar answered Oct 14 '22 14:10

Daniel Gustafsson