Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL GROUP BY optimization - InnoDB vs MyISAM?

Question

Does anyone know why there is no GROUP BY optimization for MyISAM-based tables? (I'm using this version: 5.1.49-3)

Test tables

CREATE TABLE `_test2_innodb` (
    `i` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `n` smallint(5) unsigned NOT NULL,
    `t` int(10) unsigned NOT NULL,
    `v` smallint(6) NOT NULL,
    PRIMARY KEY (`i`),
    KEY `i_n` (`n`),
    KEY `i_t` (`t`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `_test2_myisam` (
    `i` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `n` smallint(5) unsigned NOT NULL,
    `t` int(10) unsigned NOT NULL,
    `v` smallint(6) NOT NULL,
    PRIMARY KEY (`i`),
    KEY `i_n` (`n`),
    KEY `i_t` (`t`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Test queries

SELECT MAX(i) FROM _test2_myisam GROUP BY n;

SELECT MAX(i) FROM _test2_innodb GROUP BY n;

Results

id, select_type, table, type, poss_keys, key, key_len, ref, rows, extra

1, SIMPLE, _test2_myisam , ALL, , , , , 19998, Using temporary; Using filesort

1, SIMPLE, _test2_innodb, index, , i_n, 2, , 20024, Using index

The problem is that if I'm using MyISAM, a full table scan will be done, which takes hours on large tables... And the MySQL documentation doesn't mention anything about table engines having different implementations ( http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html ). Does anyone know why is this handled differently internally?

(Note: no, it is not a good solution to switch to InnoDB) Thanks

like image 441
Kristof Kotai Avatar asked Oct 22 '22 04:10

Kristof Kotai


1 Answers

The difference between the two tables, despite the seemingly identical definitions, is that MyISAM tables are "heaps" while InnoDB tables are clustered organized tables, e.g. the clustered index is the table (usually the primary key, the (i) index here).

The other difference, that causes the different execution plans is that in InnoDB, all non-clustered indexes (your (n) index in this case), include also the columns of the clustered index, the (i) in this case, so no full table scan is needed.

In other words, the (n) InnoDB index is roughly equivalent to a (n, PK) MyISAM index.

The MyISAM engine has to either do a full table scan or an index scan of the (n) index and then also scan the table (to get the values of the i column). So, it chooses the first plan (full scan + filesort to find the MAX values).


Do the test again, after adding an index on (n, i) in the MyISAM table:

ALTER TABLE _test2_myisam
    ADD INDEX n_i (n, i) ;
like image 61
ypercubeᵀᴹ Avatar answered Nov 02 '22 23:11

ypercubeᵀᴹ