Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between USING and ON when joining more than two tables

Tags:

sql

join

mysql

Say I have three tables with the following data in them:

CREATE TABLE movies (
  movie_id INT,
  movie_name VARCHAR(255),
  PRIMARY KEY (movie_id)
);
CREATE TABLE movie_ratings (
  movie_rating_id INT,
  movie_id INT,
  rating_value TINYINT,
  PRIMARY KEY (movie_rating_id),
  KEY movie_id (movie_id)
);
CREATE TABLE movie_actors (
  movie_actor_id INT,
  movie_id INT,
  actor_id INT,
  PRIMARY KEY (movie_actor_id),
  KEY movie_id (movie_id)
);
INSERT INTO movies VALUES (1, 'Titanic'),(2,'Star Trek');
INSERT INTO movie_ratings VALUES (1,1,5),(2,1,4),(3,1,5);
INSERT INTO movie_actors VALUES (1,1,2),(2,2,2);

If I wanted to get the average rating and number of actors for each movie, I could do this using JOINs:

SELECT m.movie_name, AVG(rating_value) AS avgRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings r ON m.movie_id = r.movie_id
LEFT JOIN movie_actors a ON m.movie_id = a.movie_id
GROUP BY m.movie_id;

Let's call that query A. Query A can be rewritten with USING thusly:

SELECT m.movie_name, AVG(rating_value) AS avgRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings r USING (movie_id)
LEFT JOIN movie_actors a USING (movie_id)
GROUP BY m.movie_id;

Let's call that query B.

Both of those queries return 1 as numActors for the movie 'Star Trek'. So let's modify that query a bit:

SELECT m.movie_name, AVG(rating_value) AS avgRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings r ON m.movie_id = r.movie_id
LEFT JOIN movie_actors a ON r.movie_id = a.movie_id
GROUP BY m.movie_id;

Let's call this query C. Instead of doing m.movie_id = a.movie_id I'm now doing r.movie_id = a.movie_id. For query C numActors is 0.

My questions are:

  1. How can I write query C using USING? Can I?
  2. Is USING essentially doing an ON with the current table and the table mentioned in FROM?
  3. If the answer to #2 is yes then what does USING do when an implicit JOIN is used and multiple tables are in the FROM?
like image 668
neubert Avatar asked Oct 27 '14 21:10

neubert


4 Answers

If the column name is the same in both tables then yes, you can use USING().

In other words, this:

SELECT movie_name, AVG(rating_value) AS averageRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings r ON m.movie_id = r.movie_id
LEFT JOIN movie_actors a ON m.movie_id = a.movie_id
GROUP BY m.movie_id;

Is the same as:

SELECT movie_name, AVG(rating_value) AS averageRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings USING (movie_id)
LEFT JOIN movie_actors USING (movie_id)
GROUP BY movie_id;

As far as the ambiguity there won't be any here. It will join the tables when the movie_id is equal. In your select statement, you are pulling the movie_name, which only exists in one column.

However, if you said this:

SELECT movie_id, AVG(rating_value) AS averageRating, COUNT(actor_id) AS numActors

MySQL will say there is an error because movie_id cannot be resolved because it as ambiguous. To fix this ambiguity, you'd just have to make sure you used a table alias or name when selecting movie_id.

This is a valid select statement:

SELECT m.movie_id, AVG(rating_value) AS averageRating, COUNT(actor_id) AS numActors

No error would be thrown for this.

I would like to comment that I foresee some danger here. If you left join movies with all of these tables, you could potentially receive null values. If movie_id 1 does not have any ratings, your AVG(rating_value) will return null. You won't have this problem for COUNT(actor_id) as this will just return 0. I don't know if this bothers you, but be aware that that column could return null.

I built the sample tables in MySQL workbench, and I'm unable to get SQL Fiddle to work to show you, but if you would like to see the data I've created let me know and I will edit the question.

like image 146
AdamMc331 Avatar answered Oct 19 '22 20:10

AdamMc331


1. Can C be rewritten using USING?

Yes, you can, using a nested join:

SELECT m.movie_name, AVG(rating_value) AS avgRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN (
  movie_ratings r 
  LEFT JOIN movie_actors a USING (movie_id)
) USING (movie_id)
GROUP BY m.movie_id

2. Is USING essentially doing an ON with the current table and the table mentioned in FROM?

No. MySQL Documentation says:

The evaluation of multi-way natural joins differs in a very important way that affects the result of NATURAL or USING joins and that can require query rewriting. Suppose that you have three tables t1(a,b), t2(c,b), and t3(a,c) that each have one row: t1(1,2), t2(10,2), and t3(7,10). Suppose also that you have this NATURAL JOIN on the three tables:

SELECT ... FROM t1 NATURAL JOIN t2 NATURAL JOIN t3;

Previously, the left operand of the second join was considered to be t2, whereas it should be the nested join (t1 NATURAL JOIN t2). As a result, the columns of t3 are checked for common columns only in t2, and, if t3 has common columns with t1, these columns are not used as equi-join columns. Thus, previously, the preceding query was transformed to the following equi-join:

SELECT ... FROM t1, t2, t3 WHERE t1.b = t2.b AND t2.c = t3.c;

So basically, in older versions of MySQL your query B was not the same as query A, but as query C!

3. What does USING do when an implicit JOIN is used and multiple tables are in the FROM?

Again, citing the MySQL Documentation:

Previously, the comma operator (,) and JOIN both had the same precedence, so the join expression t1, t2 JOIN t3 was interpreted as ((t1, t2) JOIN t3). Now JOIN has higher precedence, so the expression is interpreted as (t1, (t2 JOIN t3)). This change affects statements that use an ON clause, because that clause can refer only to columns in the operands of the join, and the change in precedence changes interpretation of what those operands are.

It's all about join-order and precedence. So basically t1, t2 JOIN t3 USING (x) would do t2 JOIN t3 USING(x) first and join that with t1.

like image 44
wolfgangwalther Avatar answered Oct 19 '22 19:10

wolfgangwalther


There is no ambiguity as USING applies to the tables in the join so this query

SELECT movie_name, AVG(rating_value), COUNT(actor_id)
FROM movies m
LEFT JOIN movie_ratings r USING (movie_id)
LEFT JOIN movie_actors a USING (movie_id)
GROUP BY m.movie_id;

is pretty much equivalent to the one with inner joins except that the movie_idcolumn should only appear once in the results, instead of three times in theinner joincase.

See this example for the column elimination: http://ideone.com/qMj5XK (using SQLite I think, SQL Fiddle wouldn't work but MySQL should behave in the same way).

like image 2
jpw Avatar answered Oct 19 '22 19:10

jpw


How can I write query C using USING? Can I?

Like jpw mentionned in is answer yes you can use USING with query C. It will join m with rusing movie_id and m with a also using movie_id. In fact USING with MySQL is aligned with the SQL 2003 standard.

Is USING essentially doing an ON with the current table and the table mentioned in FROM?

Yes USING is doing an ON with the current table and the table mentioned in the FROM clause. The only difference is the number columns you are going to end with if you use an asterisk in the SELECT clause. The Oracle documentation for USING is much more helpful than the MySQL documentation about that.

If the answer to #2 is yes then what does USING do when an implicit JOIN is used and multiple tables are in the FROM?

You can try it for yourself but I'm pretty sure it wouldn't work with an implicit join (FROM tableA, tableB). This might be just another reason why implicit joins should be avoided.

Also since USING can only be used with explicit join that would mean a very awkward query mixing both explicit and implicit join. Something you probably want to avoid.

Edit :

By the way, numActors is 0 in query C because your join are incorrect. In fact if there are no movie rating then there are no actors! If you fix that you should get the same result than query B.

SELECT m.movie_name, AVG(rating_value) AS avgRating, COUNT(actor_id) AS numActors
FROM movies m
LEFT JOIN movie_ratings r ON m.movie_id = r.movie_id
LEFT JOIN movie_actors a ON m.movie_id = a.movie_id -- Instead of r.movie_id = a.movie_id
GROUP BY m.movie_id;
like image 1
ForguesR Avatar answered Oct 19 '22 19:10

ForguesR