I'm writing a script that generates a report based on a query that uses several tables joined together. One of the inputs to the script is going to be a list of the fields that are required on the report. Depending on the fields requested, some of the tables might not be needed. My question is: is there a [significant] performance penalty for including a join when if it is not referenced in a SELECT or WHERE clause?
Consider the following tables:
mysql> SELECT * FROM `Books`;
+----------------------+----------+
| title | authorId |
+----------------------+----------+
| Animal Farm | 3 |
| Brave New World | 2 |
| Fahrenheit 451 | 1 |
| Nineteen Eighty-Four | 3 |
+----------------------+----------+
mysql> SELECT * FROM `Authors`;
+----+----------+-----------+
| id | lastName | firstName |
+----+----------+-----------+
| 1 | Bradbury | Ray |
| 2 | Huxley | Aldous |
| 3 | Orwell | George |
+----+----------+-----------+
Does
SELECT
`Authors`.`lastName`
FROM
`Authors`
WHERE
`Authors`.`id` = 1
Outperform:
SELECT
`Authors`.`lastName`
FROM
`Authors`
JOIN
`Books`
ON `Authors`.`id` = `Books`.`authorId`
WHERE
`Authors`.`id` = 1
?
It seems to me that MySQL should just know to ignore the JOIN completely, since the table is not referenced in the SELECT or WHERE clause. But somehow I doubt this is the case. Of course, this is a really basic example. The actual data involved will be much more complex.
And really, it's not a terribly huge deal... I just need to know if my script needs to be "smart" about the joins, and only include them if the fields requested will rely on them.
The order you specify the joins will hardly ever matter to the query optimizer (nor if you use CTEs, sub-queries, or whatnot). However, every query optimizer has its quirks, so the order of joins could matter in very specific cicumstances.
Basically, join order DOES matter because if we can join two tables that will reduce the number of rows needed to be processed by subsequent steps, then our performance will improve.
Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow. There's an example of this in the subqueries lesson. Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows.
This isn't actually unused since it means that only Authors that exist in Books are included in the result set.
JOIN
`Books`
ON `Authors`.`id` = `Books`.`authorId`
However if you "knew" that every Author existed in Book than there would be some performance benefit in removing the join but it would largely depend on idexes and the number of records in the table and the logic in the join (especially when doing data conversions)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With