I've been profiling some queries in an application I'm working on, and I came across a query that was retrieving more rows than necessary, the result set being trimmed down in the application code.
Changing a LEFT JOIN to an INNER JOIN trimmed the result set to just what was needed, and presumably would also be more performant (since less rows are selected). In reality, the LEFT JOIN'ed query was outperforming the INNER JOIN'ed, taking half the time to complete.
LEFT JOIN: (127 total rows, Query took 0.0011 sec)
INNER JOIN: (10 total rows, Query took 0.0024 sec)
(I ran the queries multiple times and those are averages).
Running EXPLAIN on both reveals nothing that explains the performance differences:
For the INNER JOIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE contacts index NULL name 302 NULL 235 Using where
1 SIMPLE lists eq_ref PRIMARY PRIMARY 4 contacts.list_id 1
1 SIMPLE lists_to_users eq_ref PRIMARY PRIMARY 8 lists.id,const 1
1 SIMPLE tags eq_ref PRIMARY PRIMARY 4 lists_to_users.tag_id 1
1 SIMPLE users eq_ref email_2 email_2 302 contacts.email 1 Using where
For the LEFT JOIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE contacts index NULL name 302 NULL 235 Using where
1 SIMPLE lists eq_ref PRIMARY PRIMARY 4 contacts.list_id 1
1 SIMPLE lists_to_users eq_ref PRIMARY PRIMARY 8 lists.id,const 1
1 SIMPLE tags eq_ref PRIMARY PRIMARY 4 lists_to_users.tag_id 1
1 SIMPLE users eq_ref email_2 email_2 302 contacts.email 1
And the query itself:
SELECT `contacts`.*, `lists`.`name` AS `group`, `lists`.`id` AS `group_id`, `lists`.`shared_yn`, `tags`.`name` AS `context`, `tags`.`id` AS `context_id`, `tags`.`color` AS `context_color`, `users`.`id` AS `user_id`, `users`.`avatar`
FROM `contacts`
LEFT JOIN `lists` ON lists.id=contacts.list_id
LEFT JOIN `lists_to_users` ON lists_to_users.list_id=lists.id AND lists_to_users.user_id='1' AND lists_to_users.creator='1'
LEFT JOIN `tags` ON tags.id=lists_to_users.tag_id
INNER JOIN `users` ON users.email=contacts.email
WHERE (contacts.user_id='1')
ORDER BY `contacts`.`name` ASC
(The clause that I'm talking about is the last INNER JOIN on the 'users' table)
The query runs on a MySQL 5.1 database, if it makes a difference.
Does anyone has a clue on why the LEFT JOIN'ed query outperforms the INNER JOIN'ed on in this case?
UPDATE: Due to Tomalak's suggestion that the small tables I'm using were making the INNER JOIN more complex, I'd created a test database with some mock data. The 'users' table is 5000 rows, and the contacts table is ~500,000 rows. The results are the same (also the timings haven't changed which is surprising when you consider that the tables are much bigger now).
I also ran ANALYZE and OPTIMIZE on the contacts table. Didn't make any discernible difference.
If you think that the implementation of LEFT JOIN is INNER JOIN + more work, then this result is confusing. What if the implementation of INNER JOIN is (LEFT JOIN + filtering)? Ah, it is clear now.
In the query plans, the only difference is this: users... extra: using where . This means filtering. There's an extra filtering step in the query with the inner join.
This is a different kind of filtering than is typically used in a where clause. It is simple to create an index on A to support this filtering action.
SELECT *
FROM A
WHERE A.ID = 3
Consider this query:
SELECT *
FROM A
LEFT JOIN B
ON A.ID = B.ID
WHERE B.ID is not null
This query is equivalent to inner join. There is no index on B that will help that filtering action. The reason is that the where clause is stating a condition on the result of the join, instead of a condition on B.
It's probably due to the INNER JOIN having to check each row in both tables to see if the column values (email in your case) match. The LEFT JOIN will return all from one table regardless. If it's indexed then it will know what to do faster too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With