Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is InnoDB table size much larger than expected?

Tags:

mysql

innodb

I'm trying to figure out storage requirements for different storage engines. I have this table:

CREATE TABLE  `mytest` (
  `num1` int(10) unsigned NOT NULL,
  KEY `key1` (`num1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

When I insert some values and then run show table status; I get the following:

+----------------+--------+---------+------------+---------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| Name           | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length  | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation         | Checksum | Create_options | Comment |
+----------------+--------+---------+------------+---------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| mytest         | InnoDB |      10 | Compact    | 1932473 |             35 |    67715072 |                0 |     48840704 |   4194304 |           NULL | 2010-05-26 11:30:40 | NULL                | NULL       | latin1_swedish_ci |     NULL |                |         |

Notice avg_row_length is 35. I am baffled that InnoDB would not make better use of space when I'm just storing a non-nullable integer.

I have run this same test on myISAM and by default myISAM uses 7 bytes per row on this table. When I run

ALTER TABLE mytest MAX_ROWS=50000000, AVG_ROW_LENGTH = 4;

causes myISAM to finally correctly use 5-byte rows.

When I run the same ALTER TABLE statement for InnoDB the avg_row_length does not change.

Why would such a large avg_row_length be necessary when only storing a 4-byte unsigned int?

like image 236
alessandro ferrucci Avatar asked May 26 '10 15:05

alessandro ferrucci


3 Answers

InnoDB tables are clustered, that means that all data are contained in a B-Tree with the PRIMARY KEY as a key and all other columns as a payload.

Since you don't define an explicit PRIMARY KEY, InnoDB uses a hidden 6-byte column to sort the records on.

This and overhead of the B-Tree organization (with extra non-leaf-level blocks) requires more space than sizeof(int) * num_rows.

like image 163
Quassnoi Avatar answered Oct 24 '22 19:10

Quassnoi


Here is some more info you might find useful.

InnoDB allocates data in terms of 16KB pages, so 'SHOW TABLE STATUS' will give inflated numbers for row size if you only have a few rows and the table is < 16K total. (For example, with 4 rows the average row size comes back as 4096.)

The extra 6 bytes per row for the "invisible" primary key is a crucial point when space is a big consideration. If your table is only one column, that's the ideal column to make the primary key, assuming the values in it are unique:

CREATE TABLE `mytest2`
       (`num1` int(10) unsigned NOT NULL primary key)
ENGINE=InnoDB DEFAULT CHARSET=latin1;

By using a PRIMARY KEY like this:

  1. No INDEX or KEY clause is needed, because you don't have a secondary index. The index-organized format of InnoDB tables gives you fast lookup based on the primary key value for free.
  2. You don't wind up with another copy of the NUM1 column data, which is what happens when that column is indexed explicitly.
  3. You don't wind up with another copy of the 6-byte invisible primary key values. The primary key values are duplicated in each secondary index. (That's also the reason why you probably don't want 10 indexes on a table with 10 columns, and you probably don't want a primary key that combines several different columns or is a long string column.)

So overall, sticking with just a primary key means less data associated with the table + indexes. To get a sense of overall data size, I like to run with

set innodb_file_per_table = 1;

and examine the size of the data/database/*table*.ibd files. Each .ibd file contains the data for an InnoDB table and all its associated indexes.

To quickly build up a big table for testing, I usually run a statement like so:

insert into mytest
select * from mytest;

Which doubles the amount of data each time. In the case of the single-column table using a primary key, since the values had to be unique, I used a variation to keep the values from colliding with each other:

insert into mytest2
select num1 + (select count(*) from mytest2) from mytest2;

This way, I was able to get average row size down to 25. The space overhead is based on the underlying assumption that you want to have fast lookup for individual rows using a pointer-style mechanism, and most tables will have a column whose values serve as pointers (i.e. the primary key) in addition to the columns with real data that gets summed, averaged, and displayed.

like image 24
Max Webster Avatar answered Oct 24 '22 21:10

Max Webster


IN addition to Quassnoi's very fine answer, you should probably try it out using a significant data set.

What I'd do is, load 1M rows of simulated production data in, then measure the table size and use that as a guide.

That's what I've done in the past anyway

like image 44
MarkR Avatar answered Oct 24 '22 19:10

MarkR