Say I have three tables with the following data in them:
CREATE TABLE movies (
movie_id INT,
movie_name VARCHAR(255),
PRIMARY KEY (movie_id)
);
CREATE TABLE movie_ratings (
movie_rating_id INT,
movie_id INT,
rating_value TINYINT,
PRIMARY KEY (movie_rating_id),
KEY movie_id (movie_id)
);
CREATE TABLE movie_actors (
movie_actor_id INT,
movie_id INT,
actor_id INT,
PRIMARY KEY (movie_actor_id),
KEY movie_id (movie_id)
);
INSERT INTO movies VALUES (1, 'Titanic'),(2,'Star Trek');
INSERT INTO movie_ratings VALUES (1,1,5),(2,1,4),(3,1,5);
INSERT INTO movie_actors VALUES (1,1,2),(2,2,2);
If I wanted to get the average rating and number of actors for each movie, I could do this using JOINs
:
SELECT m.movie_name, AVG(rating_value) AS avgRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings r ON m.movie_id = r.movie_id
LEFT JOIN movie_actors a ON m.movie_id = a.movie_id
GROUP BY m.movie_id;
Let's call that query A. Query A can be rewritten with USING
thusly:
SELECT m.movie_name, AVG(rating_value) AS avgRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings r USING (movie_id)
LEFT JOIN movie_actors a USING (movie_id)
GROUP BY m.movie_id;
Let's call that query B.
Both of those queries return 1 as numActors for the movie 'Star Trek'. So let's modify that query a bit:
SELECT m.movie_name, AVG(rating_value) AS avgRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings r ON m.movie_id = r.movie_id
LEFT JOIN movie_actors a ON r.movie_id = a.movie_id
GROUP BY m.movie_id;
Let's call this query C. Instead of doing m.movie_id = a.movie_id
I'm now doing r.movie_id = a.movie_id
. For query C numActors is 0.
My questions are:
USING
? Can I?USING
essentially doing an ON
with the current table and the table mentioned in FROM
?USING
do when an implicit JOIN
is used and multiple tables are in the FROM
?If the column name is the same in both tables then yes, you can use USING()
.
In other words, this:
SELECT movie_name, AVG(rating_value) AS averageRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings r ON m.movie_id = r.movie_id
LEFT JOIN movie_actors a ON m.movie_id = a.movie_id
GROUP BY m.movie_id;
Is the same as:
SELECT movie_name, AVG(rating_value) AS averageRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings USING (movie_id)
LEFT JOIN movie_actors USING (movie_id)
GROUP BY movie_id;
As far as the ambiguity there won't be any here. It will join the tables when the movie_id is equal. In your select statement, you are pulling the movie_name, which only exists in one column.
However, if you said this:
SELECT movie_id, AVG(rating_value) AS averageRating, COUNT(actor_id) AS numActors
MySQL will say there is an error because movie_id cannot be resolved because it as ambiguous. To fix this ambiguity, you'd just have to make sure you used a table alias or name when selecting movie_id.
This is a valid select statement:
SELECT m.movie_id, AVG(rating_value) AS averageRating, COUNT(actor_id) AS numActors
No error would be thrown for this.
I would like to comment that I foresee some danger here. If you left join movies with all of these tables, you could potentially receive null values. If movie_id 1 does not have any ratings, your AVG(rating_value) will return null. You won't have this problem for COUNT(actor_id) as this will just return 0. I don't know if this bothers you, but be aware that that column could return null.
I built the sample tables in MySQL workbench, and I'm unable to get SQL Fiddle to work to show you, but if you would like to see the data I've created let me know and I will edit the question.
Yes, you can, using a nested join:
SELECT m.movie_name, AVG(rating_value) AS avgRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN (
movie_ratings r
LEFT JOIN movie_actors a USING (movie_id)
) USING (movie_id)
GROUP BY m.movie_id
No. MySQL Documentation says:
The evaluation of multi-way natural joins differs in a very important way that affects the result of NATURAL or USING joins and that can require query rewriting. Suppose that you have three tables t1(a,b), t2(c,b), and t3(a,c) that each have one row: t1(1,2), t2(10,2), and t3(7,10). Suppose also that you have this NATURAL JOIN on the three tables:
SELECT ... FROM t1 NATURAL JOIN t2 NATURAL JOIN t3;
Previously, the left operand of the second join was considered to be t2, whereas it should be the nested join (t1 NATURAL JOIN t2). As a result, the columns of t3 are checked for common columns only in t2, and, if t3 has common columns with t1, these columns are not used as equi-join columns. Thus, previously, the preceding query was transformed to the following equi-join:
SELECT ... FROM t1, t2, t3 WHERE t1.b = t2.b AND t2.c = t3.c;
So basically, in older versions of MySQL your query B was not the same as query A, but as query C!
Again, citing the MySQL Documentation:
Previously, the comma operator (,) and JOIN both had the same precedence, so the join expression t1, t2 JOIN t3 was interpreted as ((t1, t2) JOIN t3). Now JOIN has higher precedence, so the expression is interpreted as (t1, (t2 JOIN t3)). This change affects statements that use an ON clause, because that clause can refer only to columns in the operands of the join, and the change in precedence changes interpretation of what those operands are.
It's all about join-order and precedence. So basically t1, t2 JOIN t3 USING (x)
would do t2 JOIN t3 USING(x)
first and join that with t1
.
There is no ambiguity as USING applies to the tables in the join so this query
SELECT movie_name, AVG(rating_value), COUNT(actor_id)
FROM movies m
LEFT JOIN movie_ratings r USING (movie_id)
LEFT JOIN movie_actors a USING (movie_id)
GROUP BY m.movie_id;
is pretty much equivalent to the one with inner joins except that the movie_id
column should only appear once in the results, instead of three times in theinner join
case.
See this example for the column elimination: http://ideone.com/qMj5XK (using SQLite I think, SQL Fiddle wouldn't work but MySQL should behave in the same way).
How can I write query C using USING? Can I?
Like jpw mentionned in is answer yes you can use USING
with query C. It will join m
with r
using movie_id
and m
with a
also using movie_id
. In fact USING
with MySQL is aligned with the SQL 2003 standard.
Is USING essentially doing an ON with the current table and the table mentioned in FROM?
Yes USING
is doing an ON with the current table and the table mentioned in the FROM
clause. The only difference is the number columns you are going to end with if you use an asterisk in the SELECT
clause. The Oracle documentation for USING
is much more helpful than the MySQL documentation about that.
If the answer to #2 is yes then what does USING do when an implicit JOIN is used and multiple tables are in the FROM?
You can try it for yourself but I'm pretty sure it wouldn't work with an implicit join (FROM tableA, tableB
). This might be just another reason why implicit joins should be avoided.
Also since USING
can only be used with explicit join that would mean a very awkward query mixing both explicit and implicit join. Something you probably want to avoid.
Edit :
By the way, numActors
is 0 in query C because your join are incorrect. In fact if there are no movie rating then there are no actors! If you fix that you should get the same result than query B.
SELECT m.movie_name, AVG(rating_value) AS avgRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings r ON m.movie_id = r.movie_id
LEFT JOIN movie_actors a ON m.movie_id = a.movie_id -- Instead of r.movie_id = a.movie_id
GROUP BY m.movie_id;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With