Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Insert ignore" vs "select and insert"

I want to write a program add new item to table. This item has an unique key name and it can be created by one of 100 threads, so I need to make sure that it is inserted only once.

I have two ideas:

  1. Use insert ignore

  2. Fetch it from database via select then insert it to table if no returned row.

Which option is better? Is there an even more superior idea?

like image 241
user3245050 Avatar asked Jan 28 '14 15:01

user3245050


People also ask

Is insert ignore faster than insert?

Similar to deletes and replace into, with this scheme, “insert ignore” can be two orders of magnitude faster than insertions into a B-tree.

Should I use insert ignore?

The presence of a unique index in a table normally causes an error to occur if you insert a record into the table that duplicates an existing record in the column or columns that define the index. Use the INSERT IGNORE command rather than the INSERT command.

What insert ignore does?

Insert Ignore statement in MySQL has a special feature that ignores the invalid rows whenever we are inserting single or multiple rows into a table. We can understand it with the following explanation, where a table contains a primary key column. The primary key column cannot stores duplicate values into a table.


1 Answers

Late to the party, but I'm pondering something similar.

I created the following table to track active users on a license per day:

CREATE TABLE `license_active_users` (
  `license_active_user_id` int(11) NOT NULL AUTO_INCREMENT,
  `license_id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  `date` date NOT NULL,
  PRIMARY KEY (`license_active_user_id`),
  UNIQUE KEY `license_id` (`license_id`,`user_id`,`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

In other words, 1 primary key and 1 unique index across the remaining 3 columns.

I then inserted 1 million unique rows into the table.

Attempting to re-insert a subset (10,000 rows) of the same data yielded the following results:

  • INSERT IGNORE: 38 seconds
  • INSERT ... ON DUPLICATE KEY UPDATE: 40 seconds
  • if (!rowExists("SELECT ...")) INSERT: <2 seconds

If those 10,000 rows aren't already present in the table:

  • INSERT IGNORE: 34 seconds
  • INSERT ... ON DUPLICATE KEY UPDATE: 41 seconds
  • if (!rowExists("SELECT ...")) INSERT: 21 seconds

So the conclusion must be if (!rowExists("SELECT ...")) INSERT is fastest by far - at least for this particular table configuration.

The missing test is if (rowExists("SELECT ...")){ UPDATE } else { INSERT }, but I'll assume INSERT ... ON DUPLICATE KEY UPDATE is faster for this operation.

For your particular case, however, I would go with INSERT IGNORE because (as far as I'm aware) it's an atomic operation and that'll save you a lot of trouble when working with threads.

like image 106
Woodgnome Avatar answered Sep 28 '22 15:09

Woodgnome