Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do unique indexes really work and avoid collisions?

Suppose I have a collection where I create a unique index on a field:

db.users.createIndex({username: 1}, {unique:true})

What happens if two documents with the same username are SIMULTANEOUSLY being inserted in the collection?
How does the database prevent the collision? I mean which one gets inserted and which results in an error?
Assuming the inserts are really SIMULTANEOUS there is no way for the database to know that two duplicates are being inserted, right?
So, what's really going on?

like image 716
Core_dumped Avatar asked May 13 '15 16:05

Core_dumped


People also ask

How do unique indexes work?

A unique index ensures that the values in the index key columns are unique. A unique constraint also guarantees that no duplicate values can be inserted into the column(s) on which the constraint is created. When a unique constraint is created a corresponding unique index is automatically created on the column(s).

Does unique index improve performance?

In addition to enforcing the uniqueness of data values, a unique index can also be used to improve data retrieval performance during query processing.

Is a unique index faster than non unique?

In theory there is a slight difference in update performance as the engine needs to enforce uniqueness in a unique index, but in reality this is one going to be at most a few CPU cycles per row difference so will be unnoticeable.

What is the difference between an index and a unique index?

Index: It is a schema object which is used to provide improved performance in the retrieval of rows from a table. Unique Index: Unique indexes guarantee that no two rows of a table have duplicate values in the key column (or columns).


1 Answers

Writes can not be applied simultaneously to the dataset. When a write is sent to a MongoDB instance, be it a shard or a standalone server, here is what happens

  1. A collection wide write lock (which resides in RAM) is requested
  2. When the lock is granted, the resulting data to be written (be it an update, an upsert or a new document) is checked against the unique indices (which usually reside in RAM)
  3. If there is no collision, the data is applied to the dataset in RAM
  4. The lock is released. Only now other writes can start performing changes to the data in memory.
  5. With the default write concern, the query returns now
  6. After commitIntervalMs the data is written to the journal
  7. Only after syncInterval seconds (60 per default), the journal is applied to the data files

That being said, we can look at the actual values. 1 million writes / second seem a bit much for a single server (simply because the mass storage can't handle it), so we assume a sharded cluster with 10 shards, with a shard key which distributes the writes more or less evenly. As we have seen above, all operations are applied in RAM. With today's hardware, some 3.5 billion instructions/s can be processed, or 3.5 instructions per nanosecond. Let's assume getting and releasing a lock each take 35 instructions or 10 nanoseconds. So locking and unlocking for each of our 100k writes would take 20 nanoseconds, altogether 1/500 of a second.

That would leave 499/500 of a second or 998000000 nanoseconds for the other stuff MongoDB needs to do, which translates to a whopping 3.493 billion instructions.

The locks to prevent concurrent writes are far from being the limiting factor for write operations. Syncing the changes to the journal and the data files is usually the limiting factor, followed by to less RAM to keep the indices and working set in RAM.

like image 164
Markus W Mahlberg Avatar answered Sep 18 '22 02:09

Markus W Mahlberg