I have a table (EMAIL) of email addresses:
EmailAddress
------------
[email protected]
[email protected]
[email protected]
[email protected]
and a table (BLACKLIST) of blacklisted email addresses:
EmailAddress
------------
[email protected]
[email protected]
and I want to select those email addresses that are in the EMAIL table but NOT in the BLACKLIST table. I'm doing:
SELECT EmailAddress
FROM EMAIL
WHERE EmailAddress NOT IN
(
SELECT EmailAddress
FROM BLACKLIST
)
but when the row counts get very high the performance is terrible.
How can I better do this? (Assume generic SQL if possible. If not, assume T-SQL.)
You can use a left outer join, or a not exists
clause.
Left outer join:
select E.EmailAddress
from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
where B.EmailAddress is null;
Not Exists:
select E.EmailAddress
from EMAIL E where not exists
(select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)
Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in
one.
As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress)
, that should help speed up the execution of your query.
NOT IN differs from NOT EXISTS if the blacklist allow null value as EmailAddress. If there is a single null value the result of the query will always return zero rows because NOT IN (null) is unknown / false for every value. The query plans therefore differs slighyly but I don't think there would be any serious performance impact.
A suggestion is to create a new table called VALIDEMAIL, add a trigger to BLACKLIST that removes addresses from VALIDEMAIL when rows are inserted and add to VALIDEMAIL when removed from BLACKLIST. Then replace EMAIL with a view that is a union of both VALIDEMAIL and BLACKLIST.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With