I am trying to insert a lot of users into a MySQL database with two tables:
The first table contains the user data. An example INSERT
looks like this (id
is the primary key, mail
is a unique key):
INSERT INTO users (id, mail, name)
VALUES (NULL, "[email protected]", "John Smith")
ON DUPLICATE KEY UPDATE name = VALUE(name)
The second table contains the group the user belongs to. It only stores two foreign keys users_id
and groups_id
. An example query looks like this:
INSERT INTO users_groups (users_id, groups_id)
VALUES (LAST_INSERT_ID(), 1)
This setup works perfectly fine for small data sets. When I import large amounts of data (>1M rows) the INSERT
s get slow. Obviously, it would be much better to do a batch insert:
INSERT INTO users (id, mail, name)
VALUES (NULL, "[email protected]", "John Smith"), (NULL, "[email protected]", "Anna Smith")
ON DUPLICATE KEY UPDATE name = VALUE(name)
and:
INSERT INTO users_groups (users_id, groups_id)
VALUES (LAST_INSERT_ID(), 1), (LAST_INSERT_ID(), 4)
The problem of course is, that LAST_INSERT_ID()
only returns one (the first) id of a batch INSERT
.
So, what I would need is a "nested" batch INSERT
, which IMO does not exist in MySQL.
What can I do to make my INSERT
s faster?
MySQL INSERT multiple rows statement In this syntax: First, specify the name of table that you want to insert after the INSERT INTO keywords. Second, specify a comma-separated column list inside parentheses after the table name. Third, specify a comma-separated list of row data in the VALUES clause.
Using Bulk Insert Statement in MySQL. The INSERT statement in MySQL also supports the use of VALUES syntax to insert multiple rows as a bulk insert statement. To do this, include multiple lists of column values, each enclosed within parentheses and separated by commas.
The T-SQL function OUTPUT, which was introduced in 2005, can be used to insert multiple values into multiple tables in a single statement. The output values of each row that was part of an INSERT, UPDATE or DELETE operation are returned by the OUTPUT clause.
To insert records from multiple tables, use INSERT INTO SELECT statement. Here, we will insert records from 2 tables.
Bulk inserts by default provide sequential auto increments, with this knowledge you can do your inserts like;
INSERT INTO users (id, mail, name)
VALUES (NULL, "[email protected]", "John Smith"),
(NULL, "[email protected]", "Anna Smith"),
(...) # repeat n-times
;
SET @LASTID=LAST_INSERT_ID()
;
INSERT INTO users_groups (users_id, groups_id)
VALUES (@LASTID - n , 1), # Note n in descending sequence
(@LASTID - n-1, 1),
...
(@LASTID - 1 , 1),
(@LASTID - 0 , 4)
;
For more information on bulk inserts and auto increment have a look at http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html
Importantly, make sure that innodb_autoinc_lock_mode=1
show global variables like 'innodb_autoinc_lock_mode'
Otherwise consider wrapping your inserts in LOCK TABLES
LOCK TABLES tbl_name WRITE
... sqls ...
UNLOCK TABLES
If you're putting millions of known rows into a table all at once, consider using LOAD DATA INFILE
since it's intended for speed in just that type of scenario, as evidenced by this quote from the docs:
The
LOAD DATA INFILE
statement reads rows from a text file into a table at a very high speed.
And at Speed of INSERT
Statements:
When loading a table from a text file, use
LOAD DATA INFILE
. This is usually 20 times faster than usingINSERT
statements.
This is assuming that your source data is coming from, or could be provided as, a text file. If you have the group ID in the file as well, you might do something like this:
CREATE TEMPORARY TABLE load_users_groups (
mail VARCHAR(60),
name VARCHAR(60),
groupid INT,
PRIMARY KEY (mail, name)
);
LOAD DATA INFILE '/path/to/file.csv'
INTO TABLE load_users_groups
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'; -- use whatever optional syntax required to parse your file
INSERT INTO users (mail, name)
SELECT mail, name FROM load_users_groups
ON DUPLICATE KEY UPDATE name = VALUES(name);
INSERT INTO users_groups (users_id, groups_id)
SELECT users.id, load_users_groups.groupid
FROM users JOIN load_users_groups USING (mail, name);
DROP TEMPORARY TABLE load_users_groups;
Whether this approach ends up being faster than your current approach depends on whether you save more time using LOAD DATA INFILE
than you spend performing two additional INSERT ... SELECT
statements to move the data into your desired tables. You may want to tweak keys on the temporary table; I can't benchmark it for you based solely on the contents of your question. I'd be interested to know how it works out, though.
The documentation also has a decent number of tips for Bulk Data Loading for InnoDB Tables and Bulk Data Loading for MyISAM Tables. I won't go through them in detail, not least because you haven't given us any DDL or server info, but you may find it helpful to read through one or the other on your own time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With