Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does MySQL eliminate common subexpressions between SELECT and HAVING/GROUP BY clause

I often see people answer MySQL questions with queries like this:

SELECT DAY(date), other columns
FROM table
GROUP BY DAY(date);

SELECT somecolumn, COUNT(*)
FROM table
HAVING COUNT(*) > 1;

I always like to give the column an alias and refer to that in the GROUP BY or HAVING clause, e.g.

SELECT DAY(date) AS day, other columns
FROM table
GROUP BY day;

SELECT somecolumn, COUNT(*) AS c
FROM table
HAVING c > 1;

Is MySQL smart enough to notice that the expressions in the later clauses are the same as in SELECT, and only do it once? I'm not sure how to test this -- EXPLAIN doesn't show any difference, but it doesn't seem to show how it's doing the grouping or filtering in the first place; it seems mainly useful for optimizing joins and WHERE clauses.

I tend to be pessimistic about MySQL optimization, so I like to give it all the help I can.

like image 659
Barmar Avatar asked May 23 '14 08:05

Barmar


2 Answers

I think this can be tested using sleep() function,
for example take a look at this demo: http://sqlfiddle.com/#!2/0bc1b/1

Select * FROM t;

| X |
|---|
| 1 |
| 2 |
| 2 |

SELECT x+sleep(1)
FROM t
GROUP BY x+sleep(1);

SELECT x+sleep(1) As name
FROM t
GROUP BY name;

Execution times of both queries are about 3000 ms ( 3 seconds ).
There are 3 records in the table, and for each record the query sleeps for 1 second only,
so it means that the expression is evaluated only once for each record, not twice.

like image 196
krokodilko Avatar answered Oct 07 '22 07:10

krokodilko


After consulting with one of the MySQL engineers, I proffer this lengthy answer.

  • Caching - no part of a query is 'remembered' for later use in that (or subsequent) query. (Contrast: the Query cache.)
  • Common subexpression elimination - no. This is a common Compiler technique, but MySQL does not use it. Example: (a-b)*(a-b) will do the subtract twice.
  • Removal of a constant from a loop - yes, with limitations. This is another Compiler technique.
  • A variety of SQL-centric hacks - yes; see below.
  • Re-evaluation of a subquery - it depends. Also, the Optimizer is gradually getting better.
  • VIEWs - it depends. There are still cases where a VIEW is destined to perform worse than the equivalent SELECT. Example: no condition pushdown into a UNION in a VIEW. Actually, this is more a matter of delayed action.
  • I think that some newer versions of MariaDB have a "subquery cache".

(Caveat: I do not have 100% confidence in any of my answer, but I do believe that most of it is correct, as of MySQL 5.7, MariaDB 10.1, etc)

Think of a multi-row SELECT as a loop. Many, maybe all, "deterministic" expressions are evaluated once. Example: Constant date expressions, even involving function calls. But...

NOW() is specifically evaluated once at the beginning of a query. Furthermore, the value is passed to Slaves when replicating. That is, by the time the query is stored on a slave, NOW() could be out of date. (SYSDATE() is another animal.)

Especially with the advent of only_full_group_by, GROUP BY needs to know if it matches the SELECT expressions. So, this looks for similar code.

HAVING and ORDER BY can use aliases from the SELECT list (unlike WHERE and GROUP BY). So SELECT expr AS x ... HAVING expr seems to reevaluate expr, but SELECT expr AS x ... HAVING x seems to reach for the already-evaluated expr.

The Windowing functions of MariaDB 10.2 have some pretty severe restrictions on where they can/cannot be reused; I don't have a complete picture of them yet.

Generally, none of this matters -- the re-evaluation of an expression (DATE(date) or even COUNT(*)) will get the same answer. Furthermore, the rummaging through the rows is usually much more costly than expression evaluation. So, unless you have a good stopwatch, you won't tell the difference.

like image 42
Rick James Avatar answered Oct 07 '22 08:10

Rick James