MIN/MAX vs ORDER BY and LIMIT
To follow up on this question: I found some results very different from what Sean McSomething describes:
I have a table with about 300M rows.
Select max(foo) from bar;
takes about 15 sec. to run
Select foo from bar order by foo desc limit 1;
takes 3 sec. to run
Sean's statement "It looks like MIN() is the way to go - it's faster in the worst case, indistinguishable in the best case" just doesn't hold for this case...but I have no idea why. Can anyone offer an explanation?
Edit: Since I am unable to show the table's structure here: assume that bar is a table in an ndb_cluster with no relations, foo is an arbitrary data point with no index.
MAX(x) - 1 simply means the max value of x in the table minus one. You can always use parenthesis and aliases ( as some_cool_name ) to make thing clearer, or to change names in the result. But the first syntax is perfectly valid.
The statement 'select 1' from any table name means that it returns only 1. For example, If any table has 4 records then it will return 1 four times.
Discussion: To find the max value of a column, use the MAX() aggregate function; it takes as its argument the name of the column for which you want to find the maximum value. If you have not specified any other columns in the SELECT clause, the maximum will be calculated for all records in the table.
The column-Name that you specify in the ORDER BY clause does not need to be the SELECT list.
To avoid a full pass, add an INDEX
on foo
column.
I came across this question and thought I'd add what I've found. Note that the columns are indexed. I'm running on MariaDB 10.2.14.
I have a query which looks like SELECT MAX(created) FROM tbl WHERE group=0 AND created IS NOT NULL
. There's an index on (group,created)
(both are ints, but created can be NULL). There are many entries with group=0
, not many where created IS NULL
. tbl
is using the Aria storage engine.
EXPLAIN shows the index is being used and gives a row count of 46312, with extra saying "Using where; Using index"
Running the query takes around 0.692s, but the status has something interesting:
Handler_read_key: 1
Handler_read_next: 45131
Handler_read_prev: 0
This seems to suggest that the key is being fully scanned for the maximum; using MIN instead of MAX seems to give similar results. This seems to suggest that MIN/MAX actually can't make use of the optimisation to just pick the first/last entry of the index here.
However, if the query is changed to SELECT created FROM tbl WHERE group=0 AND created IS NOT NULL ORDER BY created DESC LIMIT 1
, whilst the query seems to take about the same amount of time to run, and EXPLAIN shows the same info, the status shows:
Handler_read_key: 1
Handler_read_next: 0
Handler_read_prev: 0
I get similar results if the order by is changed to ASC. It seems to me that using an ORDER BY...LIMIT can skip an index scan, which could lead to faster queries if there are many rows which match the index condition, if my understanding is correct.
Note that for the above results, there's enough RAM and cache allocated for holding all indexes in cache, so, presumably, index scans are fast.
I haven't done any experiments with other conditions (different MySQL versions and storage engines) but I suppose the moral of this story is, checking status of queries via SHOW SESSION STATUS
may help provide answers to these things.
At least in this case, the ORDER BY...LIMIT may be more efficient than MIN/MAX even when an index can be used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With