Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which MySQL JOIN query is more efficient?

Tags:

sql

mysql

Given the following table structure:

CREATE TABLE user (
   uid INT(11) auto_increment,
   name VARCHAR(200),
   PRIMARY KEY(uid)
);
CREATE TABLE user_profile(
   uid INT(11),
   address VARCHAR(200),
   PRIMARY KEY(uid),
   INDEX(address)
);

Which join query is more efficient: #1,

SELECT u.name FROM user u INNER JOIN user_profile p ON u.uid = p.uid WHERE p.address = 'some constant'

or #2:

SELECT u.name FROM user u INNER JOIN (SELECT uid FROM user_profile WHERE p.address = 'some constant') p ON u.uid = p.uid

How much is the difference in efficiency?

like image 962
Leonth Avatar asked Feb 28 '23 01:02

Leonth


2 Answers

The first syntax is generally more efficient.

MySQL buffers the derived queries so using the derived query robs the user_profile of possibility to be a driven table in the join.

Even if the user_profile is leading, the subquery results should be buffered first which implies a memory and performance impact.

A LIMIT applied to the queries will make the first query much faster which is not true for the second one.

Here are the sample plans. There is an index on (val, nid) in the table t_source:

First query:

EXPLAIN
SELECT  *
FROM    t_source s1
JOIN    t_source s2
ON      s2.nid = s1.id
WHERE   s2.val = 1

1, 'SIMPLE', 's1', 'ALL', 'PRIMARY', '', '', '', 1000000, ''
1, 'SIMPLE', 's2', 'ref', 'ix_source_val,ix_source_val_nid,ix_source_vald_nid', 'ix_source_val_nid', '8', 'const,test.s1.id', 1, 'Using where'

Second query:

EXPLAIN
SELECT  *
FROM    t_source s1
JOIN    (
        SELECT  nid
        FROM    t_source s2
        WHERE   val = 1
        ) q
ON      q.nid = s1.id

1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 100000, ''
1, 'PRIMARY', 's1', 'ref', 'PRIMARY', 'PRIMARY', '4', 'q.nid', 10000, 'Using where'
2, 'DERIVED', 's2', 'ref', 'ix_source_val,ix_source_val_nid,ix_source_vald_nid', 'ix_source_vald_nid', '4', '', 91324, 'Using index'

As you can see, only a part of the index is used in the second case, and q is forced to be leading.

Update:

Derived queries (which is what this question concerns) are not to be confused with the subqueries.

While MySQL is not able to optimize derived queries (those used in the FROM clause), the subqueries (those used with IN or EXISTS) are treated much better.

See these articles in my blog for more detail:

  • Passing parameters in MySQL: IN list vs. temporary table
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
like image 186
Quassnoi Avatar answered Mar 05 '23 16:03

Quassnoi


Looking at the explain queries for these selects, we get this: (row headers are id, select_type, table, type, possible_keys, key, key_len, ref, rows, extra)

1   SIMPLE  u   system  PRIMARY NULL    NULL    NULL    1   
1   SIMPLE  p   const   PRIMARY,address PRIMARY 4   const   1   

And the EXPLAIN for the second...

1   PRIMARY u   system  PRIMARY NULL    NULL    NULL    1   
1   PRIMARY <derived2>  system  NULL    NULL    NULL    NULL    1    
2   DERIVED p   ref address address 201     1   Using where

So, the first query is simpler, and simpler is usually more efficient.

However, from your CREATEs, it would be vastly more efficient to add the address field to the user table. Since profile is 1-to-1 with the user table (on uid), it is possible to combine the tables and still keep the schema normalized.

Then, your query would be

SELECT u.name FROM user u WHERE u.address = 'some constant'

and the explain shows

1   SIMPLE  u   ref address address 201 const   1   Using where, using filesort

Oddly, the simplified schema uses filesorting, which is bad if you have lots of rows.

More on explain: http://dev.mysql.com/doc/refman/5.0/en/explain.html

like image 33
davethegr8 Avatar answered Mar 05 '23 16:03

davethegr8