Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Average row length higher than possible

Tags:

mysql

storage

This is not a duplicate of Why is InnoDB table size much larger than expected? The answer to that question states that if I don't specify a primary key then 6 bytes is added to the row. I did specify a primary key, and there is more than 6 bytes to explain here.


I have a table that is expecting millions of records, so I paid close attention to the storage size of each column. Each row should take 15 bytes (smallint = 2 bytes, date = 3 bytes, datetime = 8 bytes)

CREATE TABLE archive (
  customer_id smallint(5) unsigned NOT NULL,
  calendar_date date NOT NULL,
  inserted datetime NOT NULL,
  value smallint(5) unsigned NOT NULL,
  PRIMARY KEY (`customer_id`,`calendar_date`,`inserted`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The table now has a half million records in it and is taking more storage than expected. I ran this query to get more details from the system:

SELECT *
  FROM information_schema.TABLES
 WHERE table_name = 'archive';


information_schema.index_length = 0
information_schema.avg_row_length = 37
information_schema.engine = InnoDB
information_schema.table_type = BASE TABLE

HOW!?

I was expecting 15 bytes per row, and it's taking 37. Can anyone give me an idea of where to look next for an explanation? I've done a lot of reading on thais and I've seen some explanations for an extra 6 or 10 bytes being added to a row size, but that doesn't explain the 22 extra bytes.

One explanation is that indexes also take up storage. There are no indexes on this table.

One explanation is that the the information_schema.tables query returns an unreliable row count which would throw off the avg_row_length. I have checked the row count it is using against a count(*) query and it is only off by a little (1/20 of 1%), so that's not the whole story.

Another explanation is fragmentation. Of note, this table has been rebuilt from a sql dump, so there hasn't been any hammering of updates, inserts and deletes.

like image 371
efreed Avatar asked Dec 19 '22 21:12

efreed


1 Answers

  • Because avg_row_length is data_length / rows.

data_length is basically the total size of the table on disk. An InnoDB table is more than just a list of rows. So there's that extra overhead.

  • Because an InnoDB row is more than the data.

Similar to above, each row comes with some overhead. So that's going to add to the size of a row. An InnoDB table also isn't just a list of data crammed together. It needs a little extra empty space to work efficiently.

  • Because stuff is stored on disks in blocks and those blocks aren't always full.

Disks store things in usually 4K, 8K or 16K blocks. Sometimes things don't fit perfectly in those blocks, so you can get some empty space.

As we'll see below, MySQL is going to allocate the table in blocks. And it's going to allocate a lot more than it needs to avoid having to grow the table (which can be slow and lead to disk fragmentation which makes things even slower).


To illustrate this, let's start with an empty table.

mysql> create table foo ( id smallint(5) unsigned NOT NULL );
mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
+-------------+------------+----------------+
| data_length | table_rows | avg_row_length |
+-------------+------------+----------------+
|       16384 |          0 |              0 |
+-------------+------------+----------------+

It uses 16K, or four 4K blocks, to store nothing. The empty table doesn't need this space, but MySQL allocated it on the assumption that you're going to put a bunch of data in it. This avoids having to do an expensive reallocation on each insert.

Now let's add a row.

mysql> insert into foo (id) VALUES (1);
mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
+-------------+------------+----------------+
| data_length | table_rows | avg_row_length |
+-------------+------------+----------------+
|       16384 |          1 |          16384 |
+-------------+------------+----------------+

The table didn't get any bigger, there's all that unused space within those 4 blocks it has. There's one row which means an avg_row_length of 16K. Clearly absurd. Let's add another row.

mysql> insert into foo (id) VALUES (1);
mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
+-------------+------------+----------------+
| data_length | table_rows | avg_row_length |
+-------------+------------+----------------+
|       16384 |          2 |           8192 |
+-------------+------------+----------------+

Same thing. 16K is allocated for the table, 2 rows using that space. An absurd result of 8K per row.

As I insert more and more rows, the table size stays the same, it's using up more and more of its allocated space, and the avg_row_length comes closer to reality.

mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';                                                                     
+-------------+------------+----------------+
| data_length | table_rows | avg_row_length |
+-------------+------------+----------------+
|       16384 |       2047 |              8 |
+-------------+------------+----------------+

Here also we start to see table_rows become inaccurate. I definitely inserted 2048 rows.

Now when I insert some more...

mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
+-------------+------------+----------------+
| data_length | table_rows | avg_row_length |
+-------------+------------+----------------+
|       98304 |       2560 |             38 |
+-------------+------------+----------------+

(I inserted 512 rows, and table_rows has snapped back to reality for some reason)

MySQL decided the table needs more space, so it got resized and grabbed a bunch more disk space. avg_row_length just jumped again.

It grabbed a lot more space than it needs for those 512 rows, now it's 96K or 24 4K blocks, on the assumption that it will need it later. This minimizes how many potentially slow reallocations it needs to do and minimizes disk fragmentation.

This doesn't mean all that space was filled. It just means MySQL thought it was full enough to need more space to run efficiently. If you want an idea why that's so, look into how a hash table operates. I don't know if InnoDB uses a hash table, but the principle applies: some data structures operate best when there's some empty space.


The disk used by a table is directly related to the number of rows and types of columns in the table, but the exact formula is difficult to figure out and will change from version to version of MySQL. Your best bet is to do some empirical testing and resign yourself that you'll never get an exact number.

like image 190
Schwern Avatar answered Dec 28 '22 06:12

Schwern