Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL 5.5 "select distinct" is really slow

One of the things my app does a fair amount is:

select count(distinct id) from x;

with id the primary key for table x. With MySQL 5.1 (and 5.0), it looks like this:

mysql> explain SELECT count(distinct id) from x;
+----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+
| id | select_type | table    | type  | possible_keys | key             | key_len | ref  | rows    | Extra       |
+----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+
|  1 | SIMPLE      | x        | index | NULL          | ix_blahblahblah | 1       | NULL | 1234567 | Using index |
+----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+

On InnoDB, this isn't exactly blazing, but it's not bad, either.

This week I'm trying out MySQL 5.5.11, and was surprised to see that the same query is many times slower. With the cache primed, it takes around 90 seconds, compared to 5 seconds before. The plan now looks like this:

mysql> explain select count(distinct id) from x;
+----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows    | Extra                               |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+
|  1 | SIMPLE      | x        | range | NULL          | PRIMARY | 4       | NULL | 1234567 | Using index for group-by (scanning) |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+

One way to make it go fast again is to use select count(id) from x, which is safe because id is a primary key, but I'm going through some abstraction layers (like NHibernate) that make this a non-trivial task.

I tried analyze table x but it didn't make any appreciable difference.

It looks kind of like this bug, though it's not clear what versions that applies to, or what's happening (nobody's touched it in a year yet it's "serious/high/high").

Is there any way, besides simply changing my query, to get MySQL to be smarter about this?

UPDATE:

As requested, here's a way to reproduce it, more or less. I wrote this SQL script to generate 1 million rows of dummy data (takes 10 or 15 minutes to run):

delimiter $$
drop table if exists x;
create table x (
  id integer unsigned not null auto_increment,
  a integer,
  b varchar(100),
  c decimal(9,2),
  primary key (id),
  index ix_a (a),
  index ix_b (b),
  index ix_c (c)
) engine=innodb;
drop procedure if exists fill;
create procedure fill()
begin
  declare i int default 0;
  while i < 1000000 do
    insert into x (a,b,c) values (1,"one",1.0);
    set i = i+1;
  end while;
end$$
delimiter ;
call fill();

When it's done, I observe this behavior:

  • 5.1.48
    • select count(distinct id) from x
      • EXPLAIN is: key: ix_a, Extra: Using index
      • takes under 1.0 sec to run
    • select count(id) from x
      • EXPLAIN is: key: ix_a, Extra: Using index
      • takes under 0.5 sec to run
  • 5.5.11
    • select count(distinct id) from x
      • EXPLAIN is: key: PRIMARY, Extra: Using index for group-by
      • takes over 7.0 sec to run
    • select count(id) from x
      • EXPLAIN is: key: ix_a, Extra: Using index
      • takes under 0.5 sec to run

EDIT:

If I modify the query in 5.5 by saying

select count(distinct id) from x force index (ix_a);

it runs much faster. Indexes b and c also work (to varying degrees), and even forcing index PRIMARY helps.

like image 568
Ken Avatar asked Oct 25 '22 14:10

Ken


1 Answers

I'm not making any promises that this will be better but, as a possible work around, you could try:

SELECT COUNT(*)
    FROM (SELECT id
              FROM x
              GROUP BY id) t
like image 153
Joe Stefanelli Avatar answered Oct 27 '22 11:10

Joe Stefanelli