One of the things my app does a fair amount is:
select count(distinct id) from x;
with id
the primary key for table x
. With MySQL 5.1 (and 5.0), it looks like this:
mysql> explain SELECT count(distinct id) from x;
+----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+
| 1 | SIMPLE | x | index | NULL | ix_blahblahblah | 1 | NULL | 1234567 | Using index |
+----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+
On InnoDB, this isn't exactly blazing, but it's not bad, either.
This week I'm trying out MySQL 5.5.11, and was surprised to see that the same query is many times slower. With the cache primed, it takes around 90 seconds, compared to 5 seconds before. The plan now looks like this:
mysql> explain select count(distinct id) from x;
+----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+
| 1 | SIMPLE | x | range | NULL | PRIMARY | 4 | NULL | 1234567 | Using index for group-by (scanning) |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+
One way to make it go fast again is to use select count(id) from x
, which is safe because id
is a primary key, but I'm going through some abstraction layers (like NHibernate) that make this a non-trivial task.
I tried analyze table x
but it didn't make any appreciable difference.
It looks kind of like this bug, though it's not clear what versions that applies to, or what's happening (nobody's touched it in a year yet it's "serious/high/high").
Is there any way, besides simply changing my query, to get MySQL to be smarter about this?
UPDATE:
As requested, here's a way to reproduce it, more or less. I wrote this SQL script to generate 1 million rows of dummy data (takes 10 or 15 minutes to run):
delimiter $$
drop table if exists x;
create table x (
id integer unsigned not null auto_increment,
a integer,
b varchar(100),
c decimal(9,2),
primary key (id),
index ix_a (a),
index ix_b (b),
index ix_c (c)
) engine=innodb;
drop procedure if exists fill;
create procedure fill()
begin
declare i int default 0;
while i < 1000000 do
insert into x (a,b,c) values (1,"one",1.0);
set i = i+1;
end while;
end$$
delimiter ;
call fill();
When it's done, I observe this behavior:
select count(distinct id) from x
select count(id) from x
select count(distinct id) from x
select count(id) from x
EDIT:
If I modify the query in 5.5 by saying
select count(distinct id) from x force index (ix_a);
it runs much faster. Indexes b and c also work (to varying degrees), and even forcing index PRIMARY
helps.
I'm not making any promises that this will be better but, as a possible work around, you could try:
SELECT COUNT(*)
FROM (SELECT id
FROM x
GROUP BY id) t
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With