Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why mysql count(*) has better performance than count(id)

Tags:

mysql

At first I want compare count(*) and count(id), which has better performance?

mysql version

5.6.21-1~dotdeb.1-log

table info

PRIMARY KEY (`id`),
KEY `is_availability` (`is_availability`,`is_del`)
ENGINE=InnoDB AUTO_INCREMENT=48993819 DEFAULT CHARSET=utf8 
  1. compare without where condition

    select count(*) from op_log;
    +----------+
    | count(*) |
    +----------+
    | 48989975 |
    +----------+
    1 row in set (10.02 sec)
    
    select count(id) from op_log ;
    +-----------+
    | count(id) |
    +-----------+
    |  48989990 |
    +-----------+
    1 row in set (12.05 sec)
    

count(*) better than count(id)

  1. compare with where condition

    select count(*) from op_log where is_availability=1;
    +----------+
    | count(*) |
    +----------+
    | 48990038 |
    +----------+
    1 row in set (15.86 sec)
    select count(id) from op_log where is_availability=1;
    +-----------+
    | count(id) |
    +-----------+
    |  48990096 |
    +-----------+
    1 row in set (17.13 sec)
    

    count(*) still better than count(id)

So if I could draw the conclusion that count(*) has better performance than count(id) and why is this?

From High Performance MySQL, I got

if mysql knows some col cannot be NULL, it will optimize count(col) to count(*) internally

So I suspect that the time spend more is used to do this optimization work.

like image 337
zhuguowei Avatar asked Feb 26 '16 13:02

zhuguowei


2 Answers

Often COUNT(*) would have marginally better performance. COUNT(id) needs to check if id is not NULL for it to work. That means that it needs to read the value (as well as the small overhead for checking NULLness).

I would expect the timings to be the same in most databases when id is a clustered primary key. However, perhaps the MySQL optimizer doesn't bother avoiding the NULL check even for columns that are declared NOT NULL.

Note: when doing timings you have to be very careful about starting with a cold cache. In your case, the faster query seems to be the first run, so caching doesn't seem a likely explanation for the difference in performance.

like image 92
Gordon Linoff Avatar answered Oct 13 '22 05:10

Gordon Linoff


COUNT(*) will count all rows of your table

COUNT(column) will count non-NULLs columns only of your table.

You can check this article:

The thing is count(*) query can use covering index even while count(col) can’t. Of course you can extend index to be (i,val) and get query to be index covered again but I would use this workaround only if you can’t change the query (ie it is third party application) or in case column name is in the query for reason, and you really need count of non-NULL values.

like image 30
Rahul Tripathi Avatar answered Oct 13 '22 07:10

Rahul Tripathi