Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NATURAL JOIN vs WHERE IN Clauses

Tags:

sql

mysql

Recently, I dealt with retrieving a large amount of data which consists of thousands of records from a MySQL database. Since it was my first time to handle such large data set, I didn't think about the efficiency of the SQL statement. And the problem comes.

Here are the tables of the database (It is just a simple database model of a curriculum system):

course:

+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| course_id | int(10) unsigned    | NO   | PRI | NULL    | auto_increment |
| name      | varchar(20)         | NO   |     | NULL    |                |
| lecturer  | varchar(20)         | NO   |     | NULL    |                |
| credit    | float               | NO   |     | NULL    |                |
| week_from | tinyint(3) unsigned | NO   |     | NULL    |                |
| week_to   | tinyint(3) unsigned | NO   |     | NULL    |                |
+-----------+---------------------+------+-----+---------+----------------+

select:

+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| select_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| card_no   | int(10) unsigned | NO   |     | NULL    |                |
| course_id | int(10) unsigned | NO   |     | NULL    |                |
| term      | varchar(7)       | NO   |     | NULL    |                |
+-----------+------------------+------+-----+---------+----------------+

When I want to retrieve all the courses that a student has selected (with his card number), the SQL statement is

SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` WHERE course_id IN (
    SELECT course_id FROM `select` WHERE card_no=<student's card number>
);

But, it was extremely slow and it didn't return anything for a long time. So I changed WHERE IN clauses into NATURAL JOIN. Here is the SQL,

SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `select` NATURAL JOIN `course`
WHERE card_no=<student's card number>;

It returns immediately and works fine!

So my question is:

  • What's the difference between NATURAL JOIN and WHERE IN Clauses?
  • What makes them perform differently? (Is that maybe because I doesn't set up any INDEX?)
  • When shall we use NATURAL JOIN or WHERE IN?
like image 715
Wenhao Ji Avatar asked Apr 14 '13 06:04

Wenhao Ji


People also ask

Can we use WHERE clause with natural join?

This example uses natural join syntax to merge specified columns from the Customers and Sales tables. The result set is filtered with a WHERE clause.

What is the difference between join and WHERE clause?

Rows of the outer table that do not meet the condition specified in the On clause in the join are extended with null values for subordinate columns (columns of the subordinate table), whereas the Where clause filters the rows that actually were returned to the final output.

What is the difference between natural join and using clause?

NATURAL JOIN and USING Clause are mutually exclusive. It should not have a qualifier(table name or Alias) in the referenced columns. NATURAL JOIN uses all the columns with matching names and datatypes to join the tables. The USING Clause can be used to specify only those columns that should be used for an EQUIJOIN.

Which is faster WHERE clause or join?

“Is there a performance difference between putting the JOIN conditions in the ON clause or the WHERE clause in MySQL?” No, there's no difference. The following queries are algebraically equivalent inside MySQL and will have the same execution plan.


2 Answers

Theoretically the two queries are equivalent. I think it's just poor implementation of the MySQL query optimizer that causes JOIN to be more efficient than WHERE IN. So I always use JOIN.

Have you looked at the output of EXPLAIN for the two queries? Here's what I got for a WHERE IN:

+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
|  1 | PRIMARY            | t_users           | ALL            | NULL              | NULL    | NULL    | NULL       | 2458304 | Using where              |
|  2 | DEPENDENT SUBQUERY | t_user_attributes | index_subquery | PRIMARY,attribute | PRIMARY | 13      | func,const |       7 | Using index; Using where |
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+

It's apparently performing the subquery, then going through every row in the main table testing whether it's in -- it doesn't use the index. For the JOIN I get:

+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| id | select_type | table             | type   | possible_keys       | key       | key_len | ref                                   | rows | Extra       |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
|  1 | SIMPLE      | t_user_attributes | ref    | PRIMARY,attribute   | attribute | 1       | const                                 |   15 | Using where |
|  1 | SIMPLE      | t_users           | eq_ref | username,username_2 | username  | 12      | bbodb_test.t_user_attributes.username |    1 |             |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+

Now it uses the index.

like image 96
Barmar Avatar answered Oct 14 '22 14:10

Barmar


Try this:

SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` c
WHERE c.course_id IN (
    SELECT s.course_id 
    FROM `select` s
    WHERE card_no=<student's card number>
    AND   c.course_id = s.course_id
);

Notice the addition of the AND clause in the sub-query. This is called a co-related sub-query because it relates the two course_ids, just as the NATURAL JOIN does.

I think Barmar's index explanation is on the mark.

like image 37
Carl Avatar answered Oct 14 '22 15:10

Carl