I have a table with columns:
CREATE TABLE aggregates (
a VARHCAR,
b VARCHAR,
c VARCHAR,
metric INT
KEY test (a, b, c, metric)
);
If I do a query like:
SELECT b, c, SUM(metric) metric
FROM aggregates
WHERE a IN ('a', 'couple', 'of', 'values')
GROUP BY b, c
ORDER BY b, c
The query takes 10 seconds, explain is:
+----+-------------+------------+-------+---------------+------+---------+------+--------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+------+---------+------+--------+-----------------------------------------------------------+
| 1 | SIMPLE | aggregates | range | test | test | 767 | NULL | 582383 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+------------+-------+---------------+------+---------+------+--------+-----------------------------------------------------------+
If I also group by/order by column a, so it doesn't need temporary/filesort, but then do the same thing in another query myself:
SELECT b, c, SUM(metric) metric
FROM (
SELECT a, b, c, SUM(metric) metric
FROM aggregates
WHERE a IN ('a', 'couple', 'of', 'values')
GROUP BY a, b, c
ORDER BY a, b, c
) t
GROUP BY b, c
ORDER BY b, c
The query takes 1 second and the explain is:
+----+-------------+------------+-------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+------+---------+------+--------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 252 | Using temporary; Using filesort |
| 2 | DERIVED | aggregates | range | test | test | 767 | NULL | 582383 | Using where; Using index |
+----+-------------+------------+-------+---------------+------+---------+------+--------+---------------------------------+
Why is this? Why is it faster if I do the grouping in a separate, outer query, instead of it just doing it all in one?
In the first case the index is used to find matching records, but cannot be used to sort as you do not include the leftmost column in the group/order by clauses. I would be interested to see both queries profiles:
set profiling =1;
run query 1;
run query 2;
show profile for query 1;
show profile for query 2;
The way SQL works is the less data you have at each step the faster the query will perform. Because you are doing the grouping in the inner query first you are getting rid of a lot of data that the outside query no longer needs to process.
SQL optimisation should answer some of your questions. But the most important thing to remember is the more things you can eliminate early on in the query, the faster the query will run.
There is also a part of the database that tries different ways to run a query. This part of the server will most of the time choose the fastest path, but being more specific in your queries can really help it along. More on that on this page: Readings In Database Systems
Looking at your explain it seems the filesort on such a huge number of rows is probably hurting the query a lot. since the rows in the primary query (second query's outer scope) will be working off an in memory table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With