Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up row counting in MySQL

Suppose, for illustrative purposes, you are running a library using a simple MySQL "books" table with three columns:

(id, title, status)

  • id is the primary key
  • title is the title of the book
  • status could be an enum describing the book's current state (e.g. AVAILABLE, CHECKEDOUT, PROCESSING, MISSING)

A simple query to report how many books fall into each state is:

SELECT status, COUNT(*) FROM books GROUP BY status 

or to specifically find how many books are available:

SELECT COUNT(*) FROM books WHERE status = "AVAILABLE" 

However, once the table grows to millions of rows, these queries take several seconds to complete. Adding an index to the "status" column doesn't appear to make a difference in my experience.

Aside from periodically caching the results or explicitly updating summary info in a separate table each time a book changes state (via triggers or some other mechanism), are there any techniques for speeding up these kinds of queries? It seems that the COUNT queries end up looking at every row, and (without knowing more details) I'm a bit surprised that this information can't somehow be determined from the index.

UPDATE

Using the sample table (with an indexed "status" column) with 2 million rows, I benchmarked the GROUP BY query. Using the InnoDB storage engine, the query takes 3.0 - 3.2 seconds on my machine. Using MyISAM, the query takes 0.9 - 1.1 seconds. There was no significant difference between count(*), count(status), or count(1) in either case.

MyISAM is admittedly a bit faster, but I was curious to see if there was a way to make an equivalent query run much faster (e.g. 10-50 ms -- fast enough to be called on every webpage request for a low-traffic site) without the mental overhead of caching and triggers. It sounds like the answer is "there's no way to run the direct query quickly" which is what I expected - I just wanted to make sure I wasn't missing an easy alternative.

like image 898
Kevin Ivarsen Avatar asked Aug 26 '09 06:08

Kevin Ivarsen


People also ask

How do you speed up a count query?

So to make SELECT COUNT(*) queries fast, here's what to do:Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have.

What is most performant way to get the total number of records from a table?

The best way to get the record count is to use the sys. dm_db_partition_stats or sys. partitions system views (there is also sysindexes, but it has been left for the backward compatibility with SQL Server 2000).

How do I count the number of rows in MySQL?

To counts all of the rows in a table, whether they contain NULL values or not, use COUNT(*). That form of the COUNT() function basically returns the number of rows in a result set returned by a SELECT statement.

Why does MySQL count take so long?

That slowness is manifested by the real culprit : CACHING. The reason your query runs faster the second time around has to do with the data you read the first time. The data you needed was most likely not available in MySQL's caches when you issued the query.


2 Answers

So the question is

are there any techniques for speeding up these kinds of queries?

Well, not really. A column-based storage engine would probably be faster with those SELECT COUNT(*) queries but it would be less performant for pretty much any other query.

Your best bet is to maintain a summary table via triggers. It doesn't have much overhead and the SELECT part will be instantaneous no matter how big the table. Here's some boilerplate code:

DELIMITER //  CREATE TRIGGER ai_books AFTER INSERT ON books FOR EACH ROW UPDATE books_cnt SET total = total + 1 WHERE status = NEW.status // CREATE TRIGGER ad_books AFTER DELETE ON books FOR EACH ROW UPDATE books_cnt SET total = total - 1 WHERE status = OLD.status; // CREATE TRIGGER au_books AFTER UPDATE ON books FOR EACH ROW BEGIN     IF (OLD.status <> NEW.status)     THEN         UPDATE books_cnt SET total = total + IF(status = NEW.status, 1, -1) WHERE status IN (OLD.status, NEW.status);     END IF; END // 
like image 114
Josh Davis Avatar answered Oct 02 '22 20:10

Josh Davis


MyISAM is actually pretty fast with count(*) the downside is that the MyISAM storage is not that reliable and best avoided where data integrity is critical.

InnoDB can be very slow to perform count(*) type queries, cause it is designed to allow for multiple concurrent views of the same data. So at any point in time, its not enough to go to the index to get the count.

From: http://www.mail-archive.com/[email protected]/msg120320.html

The database starts with 1000 records in it I start a transaction You start a transaction I delete 50 records You add 50 records I do a COUNT() and see 950 records. You do a COUNT() and see 1050 records. I commit my transaction - database now has 950 records to everyone but you. You commit your transaction - database has 1000 records again.

How InnoDB keeps up with which records are "visible" or "modifiable" with respect to any transaction is through row-level locking, transaction isolation levels, and multi-versioning. http://dev.mysql.com/doc/refman/4.1/en/innodb-transaction-model.html http://dev.mysql.com/doc/refman/4.1/en/innodb-multi-versioning.html

That is what makes counting how many records each person can see is not so straight-forward.

So bottom line is you will need to look at caching the counts somehow as opposed to going to the table if you need to get at this information frequently and fast.

like image 33
Sam Saffron Avatar answered Oct 02 '22 19:10

Sam Saffron