I have two tables: 'movies' and 'users'. There's an n:m relationship between those, describing what movies a user has seen. This is described with a table 'seen' Now i want to find out for a given user, all the movies he has not seen. My current solution is like this: <pre class="prettyprint"><code>SELECT * FROM movies WHERE movies.id NOT IN ( SELECT seen.movie_id FROM seen WHERE seen.user_id=123 ) </code></pre> This works fine but does not seem to scale very well. Is there a better approach to this?

Here's a typical way to do this query without using the subquery method you showed. This may satisfy @Godeke's request to see a join-based solution. <pre class="prettyprint"><code>SELECT * FROM movies m LEFT OUTER JOIN seen s ON (m.id = s.movie_id AND s.user_id = 123) WHERE s.movie_id IS NULL; </code></pre> However, in most brands of database this solution can perform worse than the subquery solution. It's best to use EXPLAIN to analyze both queries, to see which one will do better given your schema and data. Here's another variation on the subquery solution: <pre class="prettyprint"><code>SELECT * FROM movies m WHERE NOT EXISTS (SELECT * FROM seen s WHERE s.movie_id = m.id AND s.user_id=123); </code></pre> This is a correlated subquery, which must be evaluated for every row of the outer query. Usually this is expensive, and your original example query is better. On the other hand, in MySQL "<code>NOT EXISTS</code>" is often better than "<code>column NOT IN (...)</code>" Again, you must test each solution and compare the results to be sure. It's a waste of time to choose any solution without measuring performance.

MySQL: Finding rows that don't take part in a relationship

Tags:

sql

join

mysql

entity-relationship

I have two tables: 'movies' and 'users'. There's an n:m relationship between those, describing what movies a user has seen. This is described with a table 'seen' Now i want to find out for a given user, all the movies he has not seen. My current solution is like this:

SELECT * FROM movies  WHERE movies.id NOT IN (      SELECT seen.movie_id       FROM seen       WHERE seen.user_id=123 )

This works fine but does not seem to scale very well. Is there a better approach to this?

872

asked Feb 12 '09 23:02

tliff

2 Answers

Here's a typical way to do this query without using the subquery method you showed. This may satisfy @Godeke's request to see a join-based solution.

SELECT *  FROM movies m  LEFT OUTER JOIN seen s  ON (m.id = s.movie_id AND s.user_id = 123) WHERE s.movie_id IS NULL;

However, in most brands of database this solution can perform worse than the subquery solution. It's best to use EXPLAIN to analyze both queries, to see which one will do better given your schema and data.

Here's another variation on the subquery solution:

SELECT *  FROM movies m WHERE NOT EXISTS (SELECT * FROM seen s                    WHERE s.movie_id = m.id                      AND s.user_id=123);

This is a correlated subquery, which must be evaluated for every row of the outer query. Usually this is expensive, and your original example query is better. On the other hand, in MySQL "NOT EXISTS" is often better than "column NOT IN (...)"

Again, you must test each solution and compare the results to be sure. It's a waste of time to choose any solution without measuring performance.

124

answered Sep 22 '22 07:09

Bill Karwin

Not only does your query work, it's the right approach to the problem as stated. Perhaps you can find a different way to approach the problem? A simple LIMIT on your outer select should be very fast even for large tables, for instance.

answered Sep 21 '22 07:09

dwc

Related questions
                            
                                Query times out from web app but runs fine from management studio
                            
                                What are some online websites to compile and run PL/SQL? [closed]
                            
                                Mysql, reshape data from long / tall to wide
                            
                                mysql change all values in a column
                            
                                How much real storage is used with a varchar(100) declaration in mysql?
                            
                                Alternative to except in MySQL
                            
                                SQL & PHP - Which is faster mysql_num_rows() or 'select count()'?
                            
                                Join two spreadsheets on a common column in Excel or OpenOffice
                            
                                How to sort by count with postgresql?
                            
                                Creating a sequence on an existing table
                            
                                SQLite auto-increment non-primary key field
                            
                                How to optimise this MySQL query? Millions of Rows
                            
                                Update multiple values in a single statement
                            
                                Getting offset of datetimeoffset in SQL Server
                            
                                ERROR: functions in index expression must be marked IMMUTABLE in Postgres
                            
                                Pivoting rows into columns dynamically in Oracle
                            
                                Count distinct value pairs in multiple columns in SQL
                            
                                IN vs. JOIN with large rowsets
                            
                                INNER JOIN same table
                            
                                SQL full text search vs "LIKE"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With