Recently, I dealt with retrieving a large amount of data which consists of thousands of records from a MySQL database. Since it was my first time to handle such large data set, I didn't think about the efficiency of the SQL statement. And the problem comes.
Here are the tables of the database (It is just a simple database model of a curriculum system):
course:
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| course_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(20) | NO | | NULL | |
| lecturer | varchar(20) | NO | | NULL | |
| credit | float | NO | | NULL | |
| week_from | tinyint(3) unsigned | NO | | NULL | |
| week_to | tinyint(3) unsigned | NO | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
select:
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| select_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| card_no | int(10) unsigned | NO | | NULL | |
| course_id | int(10) unsigned | NO | | NULL | |
| term | varchar(7) | NO | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
When I want to retrieve all the courses that a student has selected (with his card number), the SQL statement is
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` WHERE course_id IN (
SELECT course_id FROM `select` WHERE card_no=<student's card number>
);
But, it was extremely slow and it didn't return anything for a long time.
So I changed WHERE IN
clauses into NATURAL JOIN
. Here is the SQL,
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `select` NATURAL JOIN `course`
WHERE card_no=<student's card number>;
It returns immediately and works fine!
So my question is:
NATURAL JOIN
and WHERE IN
Clauses?INDEX
?)NATURAL JOIN
or WHERE IN
?This example uses natural join syntax to merge specified columns from the Customers and Sales tables. The result set is filtered with a WHERE clause.
Rows of the outer table that do not meet the condition specified in the On clause in the join are extended with null values for subordinate columns (columns of the subordinate table), whereas the Where clause filters the rows that actually were returned to the final output.
NATURAL JOIN and USING Clause are mutually exclusive. It should not have a qualifier(table name or Alias) in the referenced columns. NATURAL JOIN uses all the columns with matching names and datatypes to join the tables. The USING Clause can be used to specify only those columns that should be used for an EQUIJOIN.
“Is there a performance difference between putting the JOIN conditions in the ON clause or the WHERE clause in MySQL?” No, there's no difference. The following queries are algebraically equivalent inside MySQL and will have the same execution plan.
Theoretically the two queries are equivalent. I think it's just poor implementation of the MySQL query optimizer that causes JOIN to be more efficient than WHERE IN. So I always use JOIN.
Have you looked at the output of EXPLAIN for the two queries? Here's what I got for a WHERE IN
:
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
| 1 | PRIMARY | t_users | ALL | NULL | NULL | NULL | NULL | 2458304 | Using where |
| 2 | DEPENDENT SUBQUERY | t_user_attributes | index_subquery | PRIMARY,attribute | PRIMARY | 13 | func,const | 7 | Using index; Using where |
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
It's apparently performing the subquery, then going through every row in the main table testing whether it's in -- it doesn't use the index. For the JOIN I get:
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| 1 | SIMPLE | t_user_attributes | ref | PRIMARY,attribute | attribute | 1 | const | 15 | Using where |
| 1 | SIMPLE | t_users | eq_ref | username,username_2 | username | 12 | bbodb_test.t_user_attributes.username | 1 | |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
Now it uses the index.
Try this:
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` c
WHERE c.course_id IN (
SELECT s.course_id
FROM `select` s
WHERE card_no=<student's card number>
AND c.course_id = s.course_id
);
Notice the addition of the AND clause in the sub-query. This is called a co-related sub-query because it relates the two course_ids, just as the NATURAL JOIN does.
I think Barmar's index explanation is on the mark.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With