Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"SELECT COUNT(*)" is slow, even with where clause

I'm trying to figure out how to optimize a very slow query in MySQL (I didn't design this):

SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'; +----------+ | COUNT(*) | +----------+ |  3224022 | +----------+ 1 row in set (1 min 0.16 sec) 

Comparing that to a full count:

select count(*) from change_event; +----------+ | count(*) | +----------+ |  6069102 | +----------+ 1 row in set (4.21 sec) 

The explain statement doesn't help me here:

 explain SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'\G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: me          type: range possible_keys: PRIMARY           key: PRIMARY       key_len: 8           ref: NULL          rows: 4120213         Extra: Using where; Using index 1 row in set (0.00 sec) 

OK, it still thinks it needs roughly 4 million entries to count, but I could count lines in a file faster than that! I don't understand why MySQL is taking this long.

Here's the table definition:

CREATE TABLE `change_event` (   `change_event_id` bigint(20) NOT NULL default '0',   `timestamp` datetime NOT NULL,   `change_type` enum('create','update','delete','noop') default NULL,   `changed_object_type` enum('Brand','Broadcast','Episode','OnDemand') NOT NULL,   `changed_object_id` varchar(255) default NULL,   `changed_object_modified` datetime NOT NULL default '1000-01-01 00:00:00',   `modified` datetime NOT NULL default '1000-01-01 00:00:00',   `created` datetime NOT NULL default '1000-01-01 00:00:00',   `pid` char(15) default NULL,   `episode_pid` char(15) default NULL,   `import_id` int(11) NOT NULL,   `status` enum('success','failure') NOT NULL,   `xml_diff` text,   `node_digest` char(32) default NULL,   PRIMARY KEY  (`change_event_id`),   KEY `idx_change_events_changed_object_id` (`changed_object_id`),   KEY `idx_change_events_episode_pid` (`episode_pid`),   KEY `fk_import_id` (`import_id`),   KEY `idx_change_event_timestamp_ce_id` (`timestamp`,`change_event_id`),   KEY `idx_change_event_status` (`status`),   CONSTRAINT `fk_change_event_import` FOREIGN KEY (`import_id`) REFERENCES `import` (`import_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 

Version:

$ mysql --version mysql  Ver 14.12 Distrib 5.0.37, for pc-solaris2.8 (i386) using readline 5.0 

Is there something obvious I'm missing? (Yes, I've already tried "SELECT COUNT(change_event_id)", but there's no performance difference).

like image 212
Ovid Avatar asked Feb 04 '09 15:02

Ovid


People also ask

Why is select count slow?

Because all data (including Row Data) is stored in B-Tree indexes, performing a select count(PK_COLUMN) is still a considerable amount of IO (needs to reads all data pages). If you have a secondary index on the PK field, it will be able to perform less IO to perform a count.

Can we use WHERE clause with Count?

SQL SELECT COUNT() can be clubbed with SQL WHERE clause. Using the WHERE clause, we have access to restrict the data to be fed to the COUNT() function and SELECT statement through a condition.

How do I speed up a select count query?

So to make SELECT COUNT(*) queries fast, here's what to do:Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have.

Is count (*) slower than count ID?

The simple answer is no – there is no difference at all. The COUNT(*) function counts the total rows in the table, including the NULL values.


1 Answers

InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages. In order to do a range scan you still have to scan through all of the potentially wide rows in data pages; note that this table contains a TEXT column.

Two things I would try:

  1. run optimize table. This will ensure that the data pages are physically stored in sorted order. This could conceivably speed up a range scan on a clustered primary key.
  2. create an additional non-primary index on just the change_event_id column. This will store a copy of that column in index pages which be much faster to scan. After creating it, check the explain plan to make sure it's using the new index.

(you also probably want to make the change_event_id column bigint unsigned if it's incrementing from zero)

like image 147
ʞɔıu Avatar answered Sep 19 '22 21:09

ʞɔıu