Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query to exclude rows whose ID is listed in another table

Tags:

sql

mysql

I'm having a hard time figuring out how to do an optimized query to do the following, even though it sounds simple.

Suppose I have a table called promo (one column: ID), and another table called promo_has_been_displayed_to_user (two columns: promo_id and user_id, and promo_id is a foreign key referencing promo.ID). I want a query that will return all the rows in promo whose ID field is NOT mentioned in any row in promo_has_been_displayed_to_user where the promo_has_been_displayed_to_user.user_id field is set to 45. Let's say I have indexes on all fields.

(The idea is that I have a database of promotional ads and a database of users, and each time I show an ad to a user, I store in promo_has_been_displayed_to_user that it has been shown to them. Now I want to find a new ad that has not been displayed to user 45.)

It seems the theoretically optimal way to do this would be the following:

1) Get a subset of promo_has_been_displayed_to_user where user_id=45, and in that subset, maintain the index on the user_id field. 2) For each row in promo, take the ID and look it up in the subset generated in step 1, on the indexed promo_id field. 3) Return all the rows in promo where you didn't find a match in step 2.

But how do I structure a query that reflects that?

Now, I have at least two queries that will return the right answer (I have verified with test data); the problem is that I don't think they will run optimally, for the following reasons:

1)

select * from promo
where ID not in (select promo_id from promo_has_been_displayed_to_user
                 where user_id=45)

The trouble with this query is that once you have the list of IDs returned by "select promo_id from promo_has_been_displayed_to_user where user_id=45", I assume it's just a list (without an index onit), and that the "not in" check is implemented by just checking that list one at a time. If the subset of promo_has_been_displayed_to_user where user_id=45 turns out to be huge, then for each row in promo, we have to search through a huge list with no index on it.

2)

select * from promo p
where not exists (select * from promo_has_been_displayed_to_user
                  where promo_id = p.ID and user_id=45)

This time, we are doing the lookup on the indexed promo_id field. However, for every row in promo, I'm querying the entire promo_has_been_displayed_to_user table. This is wasteful if there's only a small subset of promo_has_been_displayed_to_user where user_id=45.

Is there a single query that will combine the best of both worlds -- I first whittle down promo_has_been_displayed_to_user to the subset where user_id=45, and then for each row in promo, I do an indexed lookup on promo_id to see if there's a matching row in the subset?

(This is MySQL 5.0.95, although this sounds like something that isn't database-server-specific.)

like image 423
Bennett Avatar asked Jan 04 '23 09:01

Bennett


1 Answers

You cannot do this with an inner join. What you need to do is an antijoin. Usually this is most easily done using a query like this:

 SELECT * FROM A WHERE id NOT IN (SELECT id FROM B);

That is the basic syntax of an antojoin in SQL.

An alternative way to do this however is to turn a LEFT JOIN into an antijoin and on some databases this performs better:

SELECT A.* 
  FROM A
  LEFT JOIN B ON A.id = B.id
 WHERE A.id IS NOT NULL AND B.id IS NULL;

This turns out to be equivalent and some databases can optimize it better.

like image 109
Chris Travers Avatar answered Jan 08 '23 09:01

Chris Travers