Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?

Question

I have a table foo with (among 20 others) columns bar, baz and quux with indexes on baz and quux. The table has ~500k rows.

Why do the following to queries differ so much in speed? Query A takes 0.3s, while query B takes 28s.

Query A

select baz from foo
    where bar = :bar
    and quux = (select quux from foo where bar = :bar order by quux desc limit 1)

Explain

id  select_type table   type    possible_keys   key     key_len ref     rows    Extra
1   PRIMARY     foo     ref     quuxIdx         quuxIdx 9       const   2       "Using where"
2   SUBQUERY    foo     index   NULL            quuxIdx 9       NULL    1       "Using where"

Query B

select baz from foo
    where bar = :bar
    and quux = (select MAX(quux) from foo where bar = :bar)

Explain

id  select_type table   type    possible_keys   key     key_len ref     rows    Extra
1   PRIMARY     foo     ref     quuxIdx         quuxIdx 9       const   2       "Using where"
2   SUBQUERY    foo     ALL     NULL            NULL    NULL    NULL    448060  "Using where"

I use MySQL 5.1.34.

Mark Byers · Accepted Answer

You should add an index on (bar, quux).

Without this index, MySQL can't see how to perform the query efficiently so it has to choose from various inefficient query plans.

In the first example it scans the quux index and for each row found, looks up the corresponding value of bar in the original table. This takes twice as long to check each row, but it gets lucky that a row that has the correct value of bar is near the beginning of its scan and so it can stop. This could be because the value of bar you are searching for occurs frequently, so the chance of being lucky is very high. As a result it may only have to examine a handful of rows before it finds a match, so even though it takes twice as long to check each row, the fact that only a few rows are checked gives a massive overall saving. Since you have no index on bar, MySQL doesn't know in advance that the value :bar occurs frequently so it cannot know that this query will be fast.

In the second example it uses a different plan where it always scans the whole table. Each row is read directly from the table, without usage of an index. This means that each row read is fast, but because you have a lot of rows, it is slow overall. If none of the rows matched on :bar this would be the faster query plan. But if roughly 1% of rows have the desired value of bar, it will be (very) approximately 100 times slower to use this query plan compared to the above plan. Since you have no index on bar, MySQL doesn't know this in advance.

You could also just add the missing index and then both queries will go much faster.

Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?

Tags:

performance

mysql

query-optimization

Viktor Dahl

1 Answers

Mark Byers

Recent Activity

Donate For Us

Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?

Tags:

performance

mysql

query-optimization

Viktor Dahl

1 Answers

Mark Byers

Related questions

Recent Activity

Donate For Us