How do unique indexes really work and avoid collisions?

Tags:

Suppose I have a collection where I create a unique index on a field:

db.users.createIndex({username: 1}, {unique:true})

What happens if two documents with the same username are SIMULTANEOUSLY being inserted in the collection?
How does the database prevent the collision? I mean which one gets inserted and which results in an error?
Assuming the inserts are really SIMULTANEOUS there is no way for the database to know that two duplicates are being inserted, right?
So, what's really going on?

716

asked May 13 '15 16:05

Core_dumped

1 Answers

Writes can not be applied simultaneously to the dataset. When a write is sent to a MongoDB instance, be it a shard or a standalone server, here is what happens

A collection wide write lock (which resides in RAM) is requested
When the lock is granted, the resulting data to be written (be it an update, an upsert or a new document) is checked against the unique indices (which usually reside in RAM)
If there is no collision, the data is applied to the dataset in RAM
The lock is released. Only now other writes can start performing changes to the data in memory.
With the default write concern, the query returns now
After commitIntervalMs the data is written to the journal
Only after syncInterval seconds (60 per default), the journal is applied to the data files

That being said, we can look at the actual values. 1 million writes / second seem a bit much for a single server (simply because the mass storage can't handle it), so we assume a sharded cluster with 10 shards, with a shard key which distributes the writes more or less evenly. As we have seen above, all operations are applied in RAM. With today's hardware, some 3.5 billion instructions/s can be processed, or 3.5 instructions per nanosecond. Let's assume getting and releasing a lock each take 35 instructions or 10 nanoseconds. So locking and unlocking for each of our 100k writes would take 20 nanoseconds, altogether 1/500 of a second.

That would leave 499/500 of a second or 998000000 nanoseconds for the other stuff MongoDB needs to do, which translates to a whopping 3.493 billion instructions.

The locks to prevent concurrent writes are far from being the limiting factor for write operations. Syncing the changes to the journal and the data files is usually the limiting factor, followed by to less RAM to keep the indices and working set in RAM.

164

answered Sep 18 '22 02:09

Markus W Mahlberg

Related questions
                            
                                How to properly connect to MongoDB with Spring?
                            
                                SpringBoot in Docker not connecting to Mongo in Docker
                            
                                How to catch an OperationFailure from MongoDB and PyMongo in Python
                            
                                like query in mongoDB
                            
                                Symbols used as Hash keys get converted to Strings when serialized
                            
                                MongoDB query comparing 2 fields in same collection without $where
                            
                                Is there a way to force mongodb to store certain index in ram?
                            
                                Mixing Linux and Windows MongoDB replica set and is equal hardware important for sharding
                            
                                Mongoose document schema and validation
                            
                                Is there a way to automatically make MongoDB C# Driver not to throw an EndOfStreamExceptionwhen the primary server goes down?
                            
                                MongoDB PHP: Reading from Slaves and setting persistent connections with a heavy read environment
                            
                                Dart with MongoDB
                            
                                Updating an embedded document in mongoengine
                            
                                Is 'column-adding' (schema modification) a key advantage of a NoSQL (mongodb) database over a RDBMS like MySQL [closed]
                            
                                Parse bson string in python?
                            
                                How to use Cashbah MongoDB connections?
                            
                                Matching an array field which contains any combination of the provided array in MongoDB
                            
                                Pandas backwards compatibility issue with pickle 0.14.1 and 0.15.2
                            
                                How to use MongoClientOptions instead of MongoOptions?
                            
                                Is it possible to add environment variables to MongoDB config file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do unique indexes really work and avoid collisions?

Tags:

indexing

duplicates

mongodb

unique-constraint

Core_dumped

People also ask

1 Answers

Markus W Mahlberg

Recent Activity

Donate For Us