Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is MySQL InnoDB so much slower at full table scans than MyISAM?

EDIT

OP has acknowledged a mistake when profiling PostgreSQL in his answer below. I am updating this question to reflect the comparison between MyISAM & InnoDB.

Hello,

I ran a test against MySQL InnoDB, MyISAM, and PostgreSQL to see how well each of these engines performed doing full table scans to understand what the response time might be for the cases where we inevitably need to let this happen.

The tests were conducted on an Intel Core 2 Quad Q6600 @ 2.4Ghz w/ 4GB RAM and a 7200 RPM HD with a 16MB cache.

MySQL Version was 5.0.67-community-nt-log 32-bit, PGSQL version was 8.4.

I wrote a small script to generate 5 million rows of data in a 4 column table. These are the create table statements used in MySQL and PGSQL:

-- InnoDB

CREATE TABLE sample_innodb (
id integer unsigned not null,
vc1 varchar(200) not null,
vc2 varchar(200) not null,
vc3 varchar(200) not null
) ENGINE=InnoDB;

-- MyISAM

CREATE TABLE sample_isam (
id integer unsigned not null,
vc1 varchar(200) not null,
vc2 varchar(200) not null,
vc3 varchar(200) not null
) ENGINE=MyISAM;

-- PostgreSQL

create table sample_pgsql (
id integer not null,
vc1 varchar(200) not null,
vc2 varchar(200) not null,
vc3 varchar(200) not null
);

This is the script I used to generate data for these tables:

var chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXTZabcdefghiklmnopqrstuvwxyz'.split('');

function randomString(length) {
 var str = '';
 for (var i = 0; i < length; i++) {
    str += chars[Math.floor(Math.random() * chars.length)];
 }

   return str;
}

function genrow(idv, vcv1, vcv2, vcv3) {
 return idv + "," + vcv1 + "," + vcv2 + "," + vcv3;
}

function gentable(numrows) {
 for (var i = 0; i < numrows; i++) {
    var row = 
        genrow(i,
               randomString(10),
               randomString(20),
               randomString(30));

    WScript.Echo(row);
  }
}

gentable(5000000);

I ran this script on Windows with the command:

cscript.exe /nologo test.js > data.csv

You can load this data into MySQL with these commands:

LOAD DATA LOCAL INFILE 'data.csv'
INTO TABLE sample_innodb
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(id, vc1, vc2, vc3);

LOAD DATA LOCAL INFILE 'data.csv'
INTO TABLE sample_isam
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(id, vc1, vc2, vc3);

You can load the data into PGSQL with this command:

copy sample_pgsql (id, vc1, vc2, vc3) from 'data.csv' with delimiter ','

I used this query for timing to try and force a worst case table scan scenario:

MySQL
select count(*) from [table] 
where vc1 like '%blah0%' and vc2 like '%blah1%' and vc3 like '%blah2%';
PostgreSQL
select count(*) from [table] 
where vc1 ilike '%blah0%' and vc2 ilike '%blah1%' and vc3 ilike '%blah2%';

I ran this query multiple times to get an average time to complete, leaving out the first run to get everything primed in memory.

The results were as follows:

  • InnoDB - 8.56s
  • MyISAM - 1.84s
  • PGSQL - 8.4s

Question

Why is it that InnoDB and MyISAM are so far apart in terms of time to complete a full table scan? Am I just missing something obvious in the MySQL config? I've used MySQL for years and have had no problems with it as long as my problems were constrained to the set of "indexes can solve this" problems.

Partitioning will obviously also solve this problem, but at a much higher cost.

For reference, here are my config files for MySQL and PGSQL:

MYSQL CONFIG

[client]
port=3306

[mysql]
default-character-set=utf8

[mysqld]
port=3306
basedir="C:/Program Files/MySQL/MySQL Server 5.0/"
datadir="C:/Program Files/MySQL/MySQL Server 5.0/Data/"
default-character-set=utf8
default-storage-engine=INNODB
log="c:/logs/mysql/mysqld.log"
sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
max_connections=700
query_cache_size=0M
table_cache=1400
tmp_table_size=16M
thread_cache_size=34

myisam_max_sort_file_size=100G
myisam_sort_buffer_size=8M
key_buffer_size=200M
read_buffer_size=64K
read_rnd_buffer_size=256K
sort_buffer_size=208K

innodb_additional_mem_pool_size=2M
innodb_flush_log_at_trx_commit=1
innodb_log_buffer_size=1M
innodb_buffer_pool_size=200M
innodb_log_file_size=18M
innodb_thread_concurrency=10

PGSQL CONFIG

listen_addresses = '*'        # what IP address(es) to listen on;
                # comma-separated list of addresses;
                # defaults to 'localhost', '*' = all
                # (change requires restart)
port = 5432                # (change requires restart)
max_connections = 100            # (change requires restart)

shared_buffers = 32MB            # min 128kB
                # (change requires restart)
temp_buffers = 12MB            # min 800kB
maintenance_work_mem = 32MB        # min 1MB

log_destination = 'stderr'        # Valid values are combinations of
                # stderr, csvlog, syslog and eventlog,
                # depending on platform.  csvlog
                # requires logging_collector to be on.

logging_collector = on        # Enable capturing of stderr and csvlog
                # into log files. Required to be on for
                # csvlogs.
                # (change requires restart)

log_line_prefix = '%t'            # special values:
                #   %u = user name
                #   %d = database name
                #   %r = remote host and port
                #   %h = remote host
                #   %p = process ID
                #   %t = timestamp without milliseconds
                #   %m = timestamp with milliseconds
                #   %i = command tag
                #   %c = session ID
                #   %l = session line number
                #   %s = session start timestamp
                #   %v = virtual transaction ID
                #   %x = transaction ID (0 if none)
                #   %q = stop here in non-session
                #        processes
                #   %% = '%'
                # e.g. '<%u%%%d> '

datestyle = 'iso, mdy'
lc_messages = 'English_United States.1252'            # locale for system error message
                # strings
lc_monetary = 'English_United States.1252'            # locale for monetary formatting
lc_numeric = 'English_United States.1252'            # locale for number formatting
lc_time = 'English_United States.1252'                # locale for time formatting

default_text_search_config = 'pg_catalog.english'

Also to understand how large these data sets actually are in MySQL, here's a show table status \G on them, in case this helps:

*************************** 1. row ***************************
           Name: sample_innodb
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 5000205
 Avg_row_length: 100
    Data_length: 500154368
Max_data_length: 0
   Index_length: 149700608
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2010-02-02 17:27:50
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment: InnoDB free: 497664 kB

*************************** 2. row ***************************
           Name: sample_isam
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 5000000
 Avg_row_length: 72
    Data_length: 360006508
Max_data_length: 281474976710655
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2010-02-02 17:27:50
    Update_time: 2010-02-02 17:37:23
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment:
like image 602
Anonymous Avatar asked Feb 03 '10 19:02

Anonymous


People also ask

Why is InnoDB slower than MyISAM?

In terms of data queries (SELECT), InnoDB is the clear winner, but when it comes to database writes (INSERT and UPDATE), MyISAM is somewhat faster. However, the lower speed of InnoDB is more than compensated for by its transaction protocol.

Why MyISAM is faster than InnoDB?

MyISAM vs InnoDBStorage: Performance The performance of InnoDB for large volumes of data is better as compared to MyISAM. MyISAM doesn't support transactional properties and is faster to read. As compared to InnoDB, the performance for a high volume of data is less.

Why is InnoDB so slow?

That's because it's not designed to cache data. InnoDB caches every data page and index page (and its grandmother) it touches. If your InnoDB Buffer Pool is too small, you could be caching pages, invalidating pages, and removing pages all in one query.

Is InnoDB better than MyISAM?

InnoDB supports transactional properties. It offers a higher speed of code writing with rollbacks and commits. The performance of InnoDB with large volumes of data is better than that of MyISAM.


1 Answers

On my server with your configuration my base performance is as follows:

  • InnoDB: 5.71s
  • MyISAM: 2.50s

This is not bad in my book, but it can use some tweaking.

Your InnoDB performance can be improved on several fronts:

Increase innodb_buffer_pool_size

  • This is the single most important InnoDB configuration variable. Ideally it should be 70-80% of your available RAM on a server dedicated to MySQL and InnoDB.
  • Increasing innodb_buffer_pool_size on my server to 2G (more than enough for this test) decreased the InnoDB time to 4.60s

Make id the PRIMARY KEY

  • InnoDB clusters it's data based on PRIMARY KEY. When you don't declare one, InnoDB generates a random one implicitly. Having a sequential primary key (id) is faster than a random one.
  • Making id the PRIMARY KEY on my server decreased the InnoDB time to 3.80s

Upgrade your MySQL/InnoDB

With MySQL 5.1+ MySQL supports pluggable storage engines. Specifically the new InnoDB Plugin.

The new InnoDB engine provides many numerous performance enhancements that may have significant affects on this particular type of query.

Of note:

  • As of MySQL 5.1.38, the InnoDB Plugin is included in MySQL
  • As of MySQL 5.1.43, the InnoDB Plugin is not only included, but is the default engine for MySQL
like image 77
hobodave Avatar answered Oct 12 '22 23:10

hobodave