Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is innodb's SHOW TABLE STATUS so unreliable?

Tags:

mysql

innodb

I know that you shouldn't rely on the values returned by InnoDB's SHOW TABLE STATUS. In particular, the row count and avg data length.

But I thought maybe it was an accurate value taken at some point, and then innodb only refreshes it during an ANALYZE table or maybe some other infrequent event.

Instead what Im seeing is that I can run a SHOW TABLE STATUS on the same table 5 times in 5 seconds, and just get completely different numbers each time (despite the table not having any insert/delete activity in between)

Where are these values actually coming from? Are they just corrupt in innodb?

like image 471
carpii Avatar asked Dec 24 '11 12:12

carpii


2 Answers

The official MySQL 5.1 documentation acknowledges that InnoDB does not give accurate statistics with SHOW TABLE STATUS. Whereas MYISAM tables specifically keep an internal cache of meta-data such as number of rows etc, the InnoDB engine stores both table data and indexes in */var/lib/mysql/ibdata**

InnoDB has no expedient index file allowing a quick query of row numbers.

Inconsistent table row numbers are reported by SHOW TABLE STATUS because InnoDB dynamically estimates the 'Rows' value by sampling a range of the table data (in */var/lib/mysql/ibdata**) and then extrapolates the approximate number of rows. So much so that the InnoDB documentation acknowledges row number inaccuracy of up to 50% when using SHOW TABLE STATUS

MySQL documentation suggests using the MySQL query cache to get consistent row number queries, but the docs don't specify how. A succinct explanation of how this can be done follows.

First, check that query caching is enabled:

mysql> SHOW VARIABLES LIKE 'have_query_cache';

If the value of have_query_cache is NO then enable the query cache by adding the following lines to /etc/my.cnf and then restart mysqld.

have_query_cache=1    # added 2017 08 24 wh
query_cache_size  = 1048576
query_cache_type  = 1
query_cache_limit = 1048576

(for more information see http://dev.mysql.com/doc/refman/5.1/en/query-cache.html)

Query the contents of the cache with

mysql> SHOW STATUS LIKE 'Qcache%';

Now use the SQL_CALC_FOUND_ROWS statement in a SELECT query:

SELECT SQL_CALC_FOUND_ROWS COUNT(*) FROM my_innodb_table

SQL_CALC_FOUND_ROWS will attempt a read from cache and, should this query not be found, execute the query against the specified table and then commit the number of table rows to the query cache. Additional executions of the above query (or other 'cachable' SELECT statements - see below) will consult the cache and return the correct result.

Subsequent 'cachable' SELECT queries - even if they LIMIT the result - will consult the query cache and allow you to get (once-off only) the total table row numbers with

SELECT FOUND_ROWS();

which returns the previous cached query's correct table row total.

like image 69
4 revs, 2 users 98% Avatar answered Sep 24 '22 20:09

4 revs, 2 users 98%


The reasons for not keeping accurate statistics, including the row count in the table, is the multiversioning of rows InnoDB utilizes to provide transactions. What is the actual count of rows actually depends on isolation level of the transactions (as not-commited transaction may have deleted or inserted records), and different transactions can run in different isolation levels, which means that the question 'how many records there are' may be answered correctly only if there are no transactions running. So keeping a counter of the rows or data length is nearly impossible.

Read more about InnoDB restrictions

like image 34
Maxim Krizhanovsky Avatar answered Sep 21 '22 20:09

Maxim Krizhanovsky