Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is count(*) query slow on some tables but not on others?

Tags:

mysql

wamp

I've got a mysql database running on a wamp server that I'm using to do frequent pattern mining of Flickr data. In the process of loading the data into the database, I ran a count query to determine how many images I had already loaded. I was surprised that it took 3 minutes 49 sec for

select count(*) from image;

In a separate table, "concept", I am storing a list of tags that users give their images. A similar query on the "concept" table took 0.8 sec. The mystery is that both tables have around 200,000 rows. select count(*) from image; returns 283,890 and select count(*) from concept; returns 213,357.

Here's the description of each table

Screenshot of mysql console with table descriptions

Clearly the "image" table has larger rows. I thought that perhaps "image" was too big to hold in memory based on this blog post, so I also tested the size of the tables using code from this answer.

SELECT table_name AS "Tables", 
round(((data_length + index_length) / 1024 / 1024), 2) "Size in MB" 
FROM information_schema.TABLES 
WHERE table_schema = "$DB_NAME"
ORDER BY (data_length + index_length) DESC;

"image" is 179.98 MB, "concept" is 15.45 MB

I'm running mysql on a machine with 64 GB of RAM, so both these tables should easily fit. What am I missing that is slowing down my queries? And how can I fix it?

like image 411
Cecilia Avatar asked Mar 04 '15 18:03

Cecilia


2 Answers

When performing SELECT COUNT(*) on an InnDB table, MySQL must scan through an index to count the rows. In this case, your only index is the primary (clustered) index, so MySQL scans through that.

For the clustered index, the actual table data is stored there as well. Not including overhead, your image table is approximately 1973 bytes per row (I'm assuming a single-byte character set for both primary key columns). That's about 8 records max per (16k) page, so about 35,486 pages. Your comcept table is approximately 257 bytes per row. That's about 63 records per page, so about 3,386 pages. That's a huge difference in the amount of data that must be scanned.

It has to read each page entirely because the pages may not be entirely full.

Then, performance wise, perhaps some of those pages are in memory and some are not. There are also some marginal differences due to MySQL's 15/16 preference, but all numbers above should be considered approximations.

Solution

Adding a secondary index to the larger table should yield approximately the same performance for SELECT COUNT(*) as the smaller table. Of course, with another index to update, updates will be a bit slower.

For improved performance, shorten your primary key because secondary indexes include the indexed column(s) and the full primary key.

If you only need an estimated number of rows, you can use the rows value from one of the following, which uses the table statistics instead of scanning the index:

SHOW TABLE STATUS LIKE 'image'

or

EXPLAIN SELECT COUNT(*) FROM image
like image 184
Marcus Adams Avatar answered Nov 04 '22 07:11

Marcus Adams


If you're looking for a ballpark number rather than an exact count, then the Rows column from show table status may be good enough. It's not always accurate for InnoDB tables, but it seems like you're probably ok with a rough estimate anyway.

like image 41
dpw Avatar answered Nov 04 '22 09:11

dpw