Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two queries faster than than one?

I have a table with columns:

CREATE TABLE aggregates (
    a VARHCAR,
    b VARCHAR,
    c VARCHAR,
    metric INT
    KEY test (a, b, c, metric)
);

If I do a query like:

SELECT b, c, SUM(metric) metric
FROM aggregates
WHERE a IN ('a', 'couple', 'of', 'values')
GROUP BY b, c
ORDER BY b, c

The query takes 10 seconds, explain is:

+----+-------------+------------+-------+---------------+------+---------+------+--------+-----------------------------------------------------------+
| id | select_type | table      | type  | possible_keys | key  | key_len | ref  | rows   | Extra                                                     |
+----+-------------+------------+-------+---------------+------+---------+------+--------+-----------------------------------------------------------+
|  1 | SIMPLE      | aggregates | range | test          | test | 767     | NULL | 582383 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+------------+-------+---------------+------+---------+------+--------+-----------------------------------------------------------+

If I also group by/order by column a, so it doesn't need temporary/filesort, but then do the same thing in another query myself:

SELECT b, c, SUM(metric) metric
FROM (
    SELECT a, b, c, SUM(metric) metric
    FROM aggregates
    WHERE a IN ('a', 'couple', 'of', 'values')
    GROUP BY a, b, c
    ORDER BY a, b, c
) t
GROUP BY b, c
ORDER BY b, c

The query takes 1 second and the explain is:

+----+-------------+------------+-------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table      | type  | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+------------+-------+---------------+------+---------+------+--------+---------------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL | NULL    | NULL |    252 | Using temporary; Using filesort |
|  2 | DERIVED     | aggregates | range | test          | test | 767     | NULL | 582383 | Using where; Using index        |
+----+-------------+------------+-------+---------------+------+---------+------+--------+---------------------------------+

Why is this? Why is it faster if I do the grouping in a separate, outer query, instead of it just doing it all in one?

like image 905
Jaka Jančar Avatar asked Sep 26 '11 14:09

Jaka Jančar


2 Answers

In the first case the index is used to find matching records, but cannot be used to sort as you do not include the leftmost column in the group/order by clauses. I would be interested to see both queries profiles:

set profiling =1;

run query 1;

run query 2;

show profile for query 1;

show profile for query 2;

like image 23
ggiroux Avatar answered Nov 16 '22 13:11

ggiroux


The way SQL works is the less data you have at each step the faster the query will perform. Because you are doing the grouping in the inner query first you are getting rid of a lot of data that the outside query no longer needs to process.

SQL optimisation should answer some of your questions. But the most important thing to remember is the more things you can eliminate early on in the query, the faster the query will run.

There is also a part of the database that tries different ways to run a query. This part of the server will most of the time choose the fastest path, but being more specific in your queries can really help it along. More on that on this page: Readings In Database Systems

Looking at your explain it seems the filesort on such a huge number of rows is probably hurting the query a lot. since the rows in the primary query (second query's outer scope) will be working off an in memory table.

like image 199
Serdalis Avatar answered Nov 16 '22 13:11

Serdalis