Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating custom Object ID in MongoDB

Tags:

mongodb

I am creating a service for which I will use MongoDB as a storage backend. The service will produce a hash of the user input and then see if that same hash (+ input) already exists in our dataset.

The hash will be unique yet random ( = non-incremental/sequential), so my question is:

  1. Is it -legitimate- to use a random value for an Object ID? Example:

$object_id = new MongoId(HEX-OF-96BIT-HASH);

Or will MongoDB treat the ObjectID differently from other server-produced ones, since a "real" ObjectID also contains timestamps, machine_id, etc?

What are the pros and cons of using a 'random' value? I guess it would be statistically slower for the engine to update the index on inserts when the new _id's are not in any way incremental - am I correct on that?

like image 759
Joe Avatar asked Aug 31 '12 07:08

Joe


People also ask

How object ID is created in MongoDB?

In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key. If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId for the _id field. The 5 byte "random value" does not appear to be random.

Can we change object ID in MongoDB?

You cannot update it. You'll have to save the document using a new _id , and then remove the old document.

Is MongoDB object ID unique?

MongoDB is a NoSQL database that operates with collections and documents. Each document created on MongoDB has a unique object ID property. So when creating a document without entering an ID, the document will be created with an auto-generated ID.

What type is MongoDB object ID?

What Is MongoDB ObjectID? As MongoDB documentation explains, "ObjectIds are small, likely unique, fast to generate, and ordered." The _id field is a 12-byte Field of BSON type made up of several 2-4 byte chains and is the unique identifier/naming convention MongoDB uses across all its content.


3 Answers

Yes it is perfectly fine to use a random value for an object id, if some value is present in _id field of a document being stored, it is treated as objectId.

Since _id field is always indexed, and primary key, you need to make sure that different objectid is generated for each object. There are some guidelines to optimize user defined object ids :

https://docs.mongodb.com/manual/core/document/#the-id-field.

like image 60
DhruvPathak Avatar answered Sep 27 '22 17:09

DhruvPathak


While any values, including hashes, can be used for the _id field, I would recommend against using random values for two reasons:

  1. You may need to develop a collision-management strategy in the case you produce identical random values for two different objects. In the question, you imply that you'll generate IDs using a some type of a hash algorithm. I would not consider these values "random" as they are based on the content you are digesting with the hash. The probability of a collision then is a function of the diversity of content and the hash algorithm. If you are using something like MD5 or SHA-1, I wouldn't worry about the algorithm, just the content you are hashing. If you need to develop a collision-management strategy then you definitely should not use random or hash-based IDs as collision management in a clustered environment is complicated and requires additional queries.

  2. Random values as well as hash values are purposefully meant to be dispersed on the number line. That (a) will require more of the B-tree index to be kept in memory at all times and (b) may cause variable insert performance due to B-tree rebalancing. MongoDB is optimized to handle ObjectIDs, which come in ascending order (with one second time granularity). You're likely better off sticking with them.

like image 41
Sim Avatar answered Sep 27 '22 15:09

Sim


I just found out an answer to one of my questions, regarding indexing performance:

If the _id's are in a somewhat well defined order, on inserts the entire b-tree for the _id index need not be loaded. BSON ObjectIds have this property.

Source: http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs

like image 26
Joe Avatar answered Sep 27 '22 17:09

Joe