I have two queries, the first one (inner join) is super fast, and the second one (left join) is super slow. How do I make the second query fast?
EXPLAIN SELECT saved.email FROM saved INNER JOIN finished ON finished.email = saved.email;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE finished index NULL email 258 NULL 32168 Using index
1 SIMPLE saved ref email email 383 func 1 Using where; Using index
EXPLAIN SELECT saved.email FROM saved LEFT JOIN finished ON finished.email = saved.email;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE saved index NULL email 383 NULL 40971 Using index
1 SIMPLE finishedindex NULL email 258 NULL 32168 Using index
Edit: I have added table info for both tables down below.
CREATE TABLE `saved` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`slug` varchar(255) DEFAULT NULL,
`email` varchar(127) NOT NULL,
[omitted fields include varchar, text, longtext, int],
PRIMARY KEY (`id`),
KEY `slug` (`slug`),
KEY `email` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=56329 DEFAULT CHARSET=utf8;
CREATE TABLE `finished` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`slug` varchar(255) DEFAULT NULL,
`submitted` int(11) DEFAULT NULL,
`status` int(1) DEFAULT '0',
`name` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
[omitted fields include varchar, text, longtext, int],
PRIMARY KEY (`id`),
KEY `assigned_user_id` (`assigned_user_id`),
KEY `event_id` (`event_id`),
KEY `slug` (`slug`),
KEY `email` (`email`),
KEY `city_id` (`city_id`),
KEY `status` (`status`),
KEY `recommend` (`recommend`),
KEY `pending_user_id` (`pending_user_id`),
KEY `submitted` (`submitted`)
) ENGINE=MyISAM AUTO_INCREMENT=33063 DEFAULT CHARSET=latin1;
With INNER JOIN, MySQL generally will start with the table with the smallest number of rows. In this case, it starts with table finished
and does a look up for the corresponding record in saved
using the index on saved.email
.
For a LEFT JOIN, (excluding some optimizations) MySQL generally joins the records in order (starting with the left most table). In this case, MySQL starts with the table saved
, then attempts to find each corresponding record in finished
. Since there is no usable index on finished.email
, it must do a full scan for each look up.
Edit
Now that you posted your schema, I can see that MySQL is ignoring the index (finished.email
) when going from utf8
to latin1
character set. You've not posted the character sets and collations for each column, so I'm going by the default character set for the table. The collations must be compatible in order for MySQL to use the index.
MySQL can coerce (upgrade) a latin1
collation, which is very limited, up to a utf8
collation such as unicode_ci
(so the first query can use the index on saved.email
by upgrading latin1
collation to utf8
), but the opposite is not true (the second query can't use the index on finished.email
since it can't downgrade a utf8
collation down to latin1
).
The solution is to change both email columns to a compatible collation, perhaps most easily by making them identical character sets and collations.
The LEFT JOIN
query is slower than the INNER JOIN
query because it's doing more work.
From the EXPLAIN output, it looks like MySQL is doing nested loop join. (There's nothing wrong with nested loops; I think that's the only join operation that MySQL uses in version 5.5 and earlier.)
For the INNER JOIN
query, MySQL is using an efficient "ref"
(index lookup) operation to locate the matching rows.
But for the LEFT JOIN
query, it looks like MySQL is doing a full scan of the index to find the matching rows. So, with the nested loops join operation, MySQL is doing a full index scan scan for each row from the other table. So, that's on the order of tens of thousands of scans, and each of those scans is inspecting tens of thousands of rows.
Using the estimated row counts from the EXPLAIN output, that's going to require (40971*32168=) 1,317,955,128 string comparisons.
The INNER JOIN
query avoids a lot of that work, so it's a lot faster. (It's avoiding all those string comparisons by using an index operation.
-- LEFT JOIN
id select table type key key_len ref rows Extra
-- ------ -------- ----- ----- ------- ---- ----- ------------------------
1 SIMPLE saved index email 383 NULL 40971 Using index
1 SIMPLE finished index email 258 NULL 32168 Using index
-- INNER JOIN
id select table type key key_len ref rows Extra
-- ------ -------- ----- ----- ------- ---- ----- ------------------------
1 SIMPLE finished index email 258 NULL 32168 Using index
1 SIMPLE saved ref email 383 func 1 Using where; Using index
^^^^^ ^^^^ ^^^^^ ^^^^^^^^^^^^
NOTE: Markus Adams spied the difference in characterset in the email
columns CREATE TABLE
statements that were added to your question.
I believe that it's the difference in the characterset that's preventing MySQL from using an index for your query.
Q2: How do I make the LEFT JOIN query faster?
A: I don't believe it's going to be possible to get that specific query to run faster, without a schema change, such as changing the characterset of the two email columns to match.
The only affect that the "outer join" to the finished
table looks like it is to produce "duplicate" rows whenever more than one matching row is found. I'm not understanding why the outer join is needed. Why not just get rid of it altogether, and just do:
SELECT saved.email FROM saved
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With