Possibility of duplicate Mongo ObjectId's being generated in two different collections?

People also ask

Are ObjectIds unique across collections?

The Object ID Problem In MongoDB, a problem arises with the counter in the object ID that may limit its uniqueness. The object ID is only unique as long as the counter does not overflow!

Are MongoDB IDs unique across collections?

The uniqueness constraint for _id is per collection, so yes - one and the same ID can occur once per Collection. It's however very unlikely, if not impossible, for the same ID to be generated twice. So in order for this to happen you would have to manually insert duplicate IDs.

How many collections can MongoDB create?

In general, try to limit your replica set to 10,000 collections.

How many collections can MongoDB handle?

With the default settings, the max number of collections is 24,000 (see http://docs.mongodb.org/manual/reference/limits/#Namespace Length). You can increase this value up to something in the range of 3 million by increasing nssize (see http://docs.mongodb.org/manual/reference/configuration-options/#nssize).

Short Answer

Just to add a direct response to your initial question: YES, if you use BSON Object ID generation, then for most drivers the IDs are almost certainly going to be unique across collections. See below for what "almost certainly" means.

Long Answer

The BSON Object ID's generated by Mongo DB drivers are highly likely to be unique across collections. This is mainly because of the last 3 bytes of the ID, which for most drivers is generated via a static incrementing counter. That counter is collection-independent; it's global. The Java driver, for example, uses a randomly initialized, static AtomicInteger.

So why, in the Mongo docs, do they say that the IDs are "highly likely" to be unique, instead of outright saying that they WILL be unique? Three possibilities can occur where you won't get a unique ID (please let me know if there are more):

Before this discussion, recall that the BSON Object ID consists of:

[4 bytes seconds since epoch, 3 bytes machine hash, 2 bytes process ID, 3 bytes counter]

Here are the three possibilities, so you judge for yourself how likely it is to get a dupe:

1) Counter overflow: there are 3 bytes in the counter. If you happen to insert over 16,777,216 (2^24) documents in a single second, on the same machine, in the same process, then you may overflow the incrementing counter bytes and end up with two Object IDs that share the same time, machine, process, and counter values.

2) Counter non-incrementing: some Mongo drivers use random numbers instead of incrementing numbers for the counter bytes. In these cases, there is a 1/16,777,216 chance of generating a non-unique ID, but only if those two IDs are generated in the same second (i.e. before the time section of the ID updates to the next second), on the same machine, in the same process.

3) Machine and process hash to the same values. The machine ID and process ID values may, in some highly unlikely scenario, map to the same values for two different machines. If this occurs, and at the same time the two counters on the two different machines, during the same second, generate the same value, then you'll end up with a duplicate ID.

These are the three scenarios to watch out for. Scenario 1 and 3 seem highly unlikely, and scenario 2 is totally avoidable if you're using the right driver. You'll have to check the source of the driver to know for sure.

ObjectIds are generated client-side in a manner similar to UUID but with some nicer properties for storage in a database such as roughly increasing order and encoding their creation time for free. The key thing for your use case is that they are designed to guarantee uniqueness to a high probability even if they are generated on different machines.

Now if you were referring to the _id field in general, we do not require uniqueness across collections so it is safe to reuse the old _id. As a concrete example, if you have two collections, colors and fruits, both could simultaneously have an object like {_id: 'orange'}.

In case you want to know more about how ObjectIds are created, here is the spec: http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification

In case anyone is having problems with duplicate Mongo ObjectIDs, you should know that despite the unlikelihood of dups happening in Mongo itself, it is possible to have duplicate _id's generated with PHP in Mongo.

The use-case where this has happened with regularity for me is when I'm looping through a dataset and attempting to inject the data into a collection.

The array that holds the injection data must be explicitly reset on each iteration - even if you aren't specifying the _id value. For some reason, the INSERT process adds the Mongo _id to the array as if it were a global variable (even if the array doesn't have global scope). This can affect you even if you are calling the insertion in a separate function call where you would normally expect the values of the array not to persist back to the calling function.

There are three solutions to this:

You can unset() the _id field from the array
You can reinitialize the entire array with array() each time you loop through your dataset
You can explicitly define the _id value yourself (taking care to define it in such a way that you don't generate dups yourself).

My guess is that this is a bug in the PHP interface, and not so much an issue with Mongo, but if you run into this problem, just unset the _id and you should be fine.

Related questions
                            
                                What is a good choice of database for a small .NET application? [closed]
                            
                                How to change the type of a field?
                            
                                DynamoDB vs MongoDB NoSQL [closed]
                            
                                MongoDB: How to find the exact version of installed MongoDB
                            
                                What is BSON and exactly how is it different from JSON?
                            
                                MongoDB: Find a document by non-existence of a field?
                            
                                MongoDB logging all queries
                            
                                How to Import .bson file format on mongodb
                            
                                mongodb, replicates and error: { "$err" : "not master and slaveOk=false", "code" : 13435 }
                            
                                How to sort mongodb with pymongo
                            
                                Unable to create/open lock file: /data/mongod.lock errno:13 Permission denied
                            
                                Mongoimport of JSON file
                            
                                In Mongoose, how do I sort by date? (node.js)
                            
                                elasticsearch v.s. MongoDB for filtering application [closed]
                            
                                Best way to store date/time in mongodb
                            
                                Store images in a MongoDB database
                            
                                How do I create a MongoDB dump of my database?
                            
                                How to use mongoimport to import csv
                            
                                Changing MongoDB data store directory
                            
                                How to sort in mongoose?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Possibility of duplicate Mongo ObjectId's being generated in two different collections?

Tags:

database

mongodb

nosql

People also ask

Recent Activity

Donate For Us