I'm trying to figure out how to optimize a very slow query in MySQL (I didn't design this):
SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'; +----------+ | COUNT(*) | +----------+ | 3224022 | +----------+ 1 row in set (1 min 0.16 sec)
Comparing that to a full count:
select count(*) from change_event; +----------+ | count(*) | +----------+ | 6069102 | +----------+ 1 row in set (4.21 sec)
The explain statement doesn't help me here:
explain SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: me type: range possible_keys: PRIMARY key: PRIMARY key_len: 8 ref: NULL rows: 4120213 Extra: Using where; Using index 1 row in set (0.00 sec)
OK, it still thinks it needs roughly 4 million entries to count, but I could count lines in a file faster than that! I don't understand why MySQL is taking this long.
Here's the table definition:
CREATE TABLE `change_event` ( `change_event_id` bigint(20) NOT NULL default '0', `timestamp` datetime NOT NULL, `change_type` enum('create','update','delete','noop') default NULL, `changed_object_type` enum('Brand','Broadcast','Episode','OnDemand') NOT NULL, `changed_object_id` varchar(255) default NULL, `changed_object_modified` datetime NOT NULL default '1000-01-01 00:00:00', `modified` datetime NOT NULL default '1000-01-01 00:00:00', `created` datetime NOT NULL default '1000-01-01 00:00:00', `pid` char(15) default NULL, `episode_pid` char(15) default NULL, `import_id` int(11) NOT NULL, `status` enum('success','failure') NOT NULL, `xml_diff` text, `node_digest` char(32) default NULL, PRIMARY KEY (`change_event_id`), KEY `idx_change_events_changed_object_id` (`changed_object_id`), KEY `idx_change_events_episode_pid` (`episode_pid`), KEY `fk_import_id` (`import_id`), KEY `idx_change_event_timestamp_ce_id` (`timestamp`,`change_event_id`), KEY `idx_change_event_status` (`status`), CONSTRAINT `fk_change_event_import` FOREIGN KEY (`import_id`) REFERENCES `import` (`import_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
Version:
$ mysql --version mysql Ver 14.12 Distrib 5.0.37, for pc-solaris2.8 (i386) using readline 5.0
Is there something obvious I'm missing? (Yes, I've already tried "SELECT COUNT(change_event_id)", but there's no performance difference).
Because all data (including Row Data) is stored in B-Tree indexes, performing a select count(PK_COLUMN) is still a considerable amount of IO (needs to reads all data pages). If you have a secondary index on the PK field, it will be able to perform less IO to perform a count.
SQL SELECT COUNT() can be clubbed with SQL WHERE clause. Using the WHERE clause, we have access to restrict the data to be fed to the COUNT() function and SELECT statement through a condition.
So to make SELECT COUNT(*) queries fast, here's what to do:Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have.
The simple answer is no – there is no difference at all. The COUNT(*) function counts the total rows in the table, including the NULL values.
InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages. In order to do a range scan you still have to scan through all of the potentially wide rows in data pages; note that this table contains a TEXT column.
Two things I would try:
optimize table
. This will ensure that the data pages are physically stored in sorted order. This could conceivably speed up a range scan on a clustered primary key.(you also probably want to make the change_event_id column bigint unsigned if it's incrementing from zero)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With