Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve INSERT performance on a very large MySQL table

I am working on a large MySQL database and I need to improve INSERT performance on a specific table. This one contains about 200 Millions rows and its structure is as follows:

(a little premise: I am not a database expert, so the code I've written could be based on wrong foundations. Please help me to understand my mistakes :) )

CREATE TABLE IF NOT EXISTS items (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(200) NOT NULL,
    key VARCHAR(10) NOT NULL,
    busy TINYINT(1) NOT NULL DEFAULT 1,
    created_at DATETIME NOT NULL,
    updated_at DATETIME NOT NULL,

    PRIMARY KEY (id, name),
    UNIQUE KEY name_key_unique_key (name, key),
    INDEX name_index (name)
) ENGINE=MyISAM
PARTITION BY LINEAR KEY(name)
PARTITIONS 25;

Every day I receive many csv files in which each line is composed by the pair "name;key", so I have to parse these files (adding values created_at and updated_at for each row) and insert the values into my table. In this one, the combination of "name" and "key" MUST be UNIQUE, so I implemented the insert procedure as follows:

CREATE TEMPORARY TABLE temp_items (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(200) NOT NULL, 
    key VARCHAR(10) NOT NULL, 
    busy TINYINT(1) NOT NULL DEFAULT 1,  
    created_at DATETIME NOT NULL, 
    updated_at DATETIME NOT NULL,  
    PRIMARY KEY (id) 
    ) 
ENGINE=MyISAM;

LOAD DATA LOCAL INFILE 'file_to_process.csv' 
INTO TABLE temp_items
FIELDS TERMINATED BY ',' 
OPTIONALLY ENCLOSED BY '\"' 
(name, key, created_at, updated_at); 

INSERT INTO items (name, key, busy, created_at, updated_at) 
(
    SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at 
    FROM temp_items
) 
ON DUPLICATE KEY UPDATE busy=1, updated_at=NOW();

DROP TEMPORARY TABLE temp_items;

The code just shown allows me to reach my goal but, to complete the execution, it employs about 48 hours, and this is a problem. I think that this poor performance are caused by the fact that the script must check on a very large table (200 Millions rows) and for each insertion that the pair "name;key" is unique.

How can I improve the performance of my script?

Thanks to all in advance.

like image 900
jbdotdev Avatar asked Aug 03 '13 12:08

jbdotdev


People also ask

How do you improve the fastness of an insert on a huge table?

To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.

Why is insert so slow MySQL?

Remove existing indexes - Inserting data to a MySQL table will slow down once you add more and more indexes. Therefore, if you're loading data to a new table, it's best to load it to a table without any indexes, and only then create the indexes, once the data was loaded.

Which is more efficient to insert data into a table in MySQL?

LOAD DATA (all forms) is more efficient than INSERT because it loads rows in bulk.

Which is faster insert or update MySQL?

Insertion is inserting a new key and update is updating the value of an existing key. If that is the case (a very common case) , update would be faster than insertion because update involves an indexed lookup and changing an existing value without touching the index.


Video Answer


2 Answers

You can use the following methods to speed up inserts:

  1. If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.

  2. When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.

  3. Take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.

Reference: MySQL.com: 8.2.4.1 Optimizing INSERT Statements

like image 182
JAR Avatar answered Oct 17 '22 09:10

JAR


Your linear key on name and the large indexes slows things down.

LINEAR KEY needs to be calculated every insert. http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html

can you show us some example data of file_to_process.csv maybe a better schema should be build.

Edit looked more closely

INSERT INTO items (name, key, busy, created_at, updated_at) 
(
    SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at 
    FROM temp_items
) 

this will proberly will create a disk temp table, this is very very slow so you should not use it to get more performance or maybe you should check some mysql config settings like tmp-table-size and max-heap-table-size maybe these are misconfigured.

like image 4
Raymond Nijland Avatar answered Oct 17 '22 11:10

Raymond Nijland