We are migrating a database from MySQL to MongoDB for performance reasons and considering what to use for IDs of the MongoDB documents. We are debating between using ObjectIDs, which is the MongoDB default, or using UUIDs instead (which is what we have been using up until now in MySQL). So far, the arguments we have to support any of these options are the following: ObjectIDs: ObjectIDs are the MongoDB default and I assume (although I'm not sure) that this is for a reason, meaning that I expect that MongoDB can handle them more efficiently than UUIDs or has another reason for preferring them. I also found this stackoverflow answer that mentions that usage of ObjectIDs makes indexing more efficient, it would be nice however to have some metrics on how much this "more efficient" is. UUIDs: Our basic argument in favour of using UUIDs (and it is a quite important one) is that they are supported, one way or another, by virtually any database. This means that if some way down the road we decide to switch from MongoDB to something else for whatever reason and we already have an API that retrieves documents from the DB based on their IDs nothing changes for the clients of this API since the IDs can continue to be exactly the same. If we were to use ObjectIDs I'm not really sure how we would go about migrating them to another DB. Does anyone have any insights on whether one of these options may be better than the other and why? Have you ever used UUIDs in MongoDB instead of ObjectIDs and if yes what were the advantages / problems you came across?

Using UUIDs in Mongo is certainly possible and reasonably well supported. For example, the Mongo docs list UUIDs as one of the common options for the <code>_id</code> field. <h3>Considerations</h3> <ul> <li> Performance – As other answers mention, benchmarks show UUIDs cause a performance drop for inserts. In the worst case measured (going from 10M to 20M docs in a collection) they've about ~2-3x slower – the difference between inserting 2,000 (UUID) and 7,500 (ObjectID) docs per second. This is a large difference but its significance depends entirely on you use case. Will you be batch inserting millions of docs at a time? For most apps I've build the common case is inserting individual documents. The same benchmarks show that, for that usage pattern, the difference is much smaller (6,250 -vs- 7,500; ~20%). Not insignificant.. but not earth shattering either.</li> <li> Portability – Many other DB platforms have good UUID support so portability would be improved. Alternatively, since UUIDs are larger (more bits) it is possible to repack an ObjectID into the "shape" of a UUID. This approach isn't as nice as direct portability but it does give you a way to "map" between existing ObjectIDs and UUIDs.</li> <li> Decentralisation – One of the big selling points of UUIDs is that they're universally unique. This makes it practical to generate them anywhere, in a decentralised fashion (in contrast to, for example an auto-incrementing value, that requires a centralised source of truth to determine the "next" value). Of course, Mongo Object IDs profess this benefit too. The difference is, UUIDs are based on a 15+ year old standard and supported on (nearly?) all platforms, languages, etc. This makes them very useful if you ever need to create entities (or specifically, sets of related entities) in disjointed systems, without interacting with the database. You can create a dataset with IDs and foreign keys in place, then write the whole graph into the database at some point in the future without conflict. Although this is also possible with Mongo ObjectIDs, finding code to generate them/work with the format will often be harder.</li> </ul> <h3>Corrections</h3> Contrary to some of the other answers: <ul> <li> UUIDs do have native Mongo support – You can use the <code>UUID()</code> function in the Mongo Shell exactly the same way you'd use <code>ObjectID()</code>; to convert a UUID string into equivalent BSON object.</li> <li> UUIDs are not especially large – When encoded using binary subtype <code>0x04</code> they're 128 bits, compared to 96 bits for ObjectIDs. (If encoded as strings they will be pretty wasteful, taking around 288 bits.)</li> <li> UUIDs can include a timestamp – Specifically, UUIDv1 encodes a timestamp with 60 bits of precision, compared to 32 bits in ObjectIDs. This is over 6 orders of magnitude more precision, so nano-seconds instead of seconds. It can actually be a decent way of storing create timestamps with more accuracy than Mongo/JS Date objects support, however... <ul> <li>The build in <code>UUID()</code> function only generates v4 (random) UUIDs so, to leverage this this, you'd to lean on on your app or Mongo driver for ID creation.</li> <li>Unlike ObjectIDs, because of the way UUIDs are chunked, the timestamp doesn't give you a natural order. This can be good or bad depending on your use case. (New standards may change this; see 2021 update below.)</li> <li>Including timestamps in your IDs is sometimes a Bad Idea. You end up leaking the created time of documents anywhere an ID is exposed. (Of course ObjectIDs also encode a timestamp so this is partly true for them too.)</li> <li>If you do this with (spec compliant) v1 UUIDs, you're also encoding part of the servers MAC address, which can potentially be used to identify the machine. Probably not an issue for most systems but also not ideal. (New standards may change this; see 2021 update below.)</li> </ul> </li> </ul> <h3>Conclusion</h3> If you think about your Mongo DB in isolation, ObjectIDs are the obvious choice. They work well out of the box and are a perfectly capable default. Using UUIDs instead does add some friction, both when working with the values (needing to convert to binary types, etc.) and in terms of performance. Whether this slight inconvenience is worth having a standardised ID format really depends on the importance you place on portability and your architectural choices. Will you be syncing data between different database platforms? Will you migrate your data to a different platform in the future? Do you need to generate IDs outside the database, in other systems or in the browser? If not now at some point in the future? UUIDs might be worth the hassle. <h3>Aug 2021 Update</h3> The IEFT recently published a draft update to the UUID spec that would introduce some new versions of the format. Specifically, UUIDv6 and UUIDv7 are based on UUIDv1 but flip the timestamp chunks so the bits are arranged from most significant to least significant. This gives the resultant values a natural order that (more or less) reflects the order in which they were created. The new versions also exclude data derived from the servers MAC address, addressing a long-standing criticism of v1 UUIDs. It'll take time for these changes to flow though to implementations but (IMHO) they significantly modernise and improve the format.

The <code>_id</code> field of MongoDB can have any value you want as long as you can guarantee that it is unique for the collection. When your data already has a natural key, there is no reason not to use this in place of the auto-generated ObjectIDs. ObjectIDs are provided as a reasonable default solution to safe time generating an own unique key (and to discourage beginners from trying to copy SQL's <code>AUTO INCREMENT</code> which is a bad idea in a distributed database). By not using ObjectIDs you also miss out on another convenience feature: An ObjectID also includes an unix timestamp when it was generated, and many drivers provide a funtion to extract it and convert it to a date. This can sometimes make a separate <code>create-date</code> field redundant. But when neither is a concern for you, you are free to use your UUIDs as <code>_id</code> field.

Using UUIDs instead of ObjectIDs in MongoDB

Tags:

mongodb

We are migrating a database from MySQL to MongoDB for performance reasons and considering what to use for IDs of the MongoDB documents. We are debating between using ObjectIDs, which is the MongoDB default, or using UUIDs instead (which is what we have been using up until now in MySQL). So far, the arguments we have to support any of these options are the following:

ObjectIDs: ObjectIDs are the MongoDB default and I assume (although I'm not sure) that this is for a reason, meaning that I expect that MongoDB can handle them more efficiently than UUIDs or has another reason for preferring them. I also found this stackoverflow answer that mentions that usage of ObjectIDs makes indexing more efficient, it would be nice however to have some metrics on how much this "more efficient" is.

UUIDs: Our basic argument in favour of using UUIDs (and it is a quite important one) is that they are supported, one way or another, by virtually any database. This means that if some way down the road we decide to switch from MongoDB to something else for whatever reason and we already have an API that retrieves documents from the DB based on their IDs nothing changes for the clients of this API since the IDs can continue to be exactly the same. If we were to use ObjectIDs I'm not really sure how we would go about migrating them to another DB.

Does anyone have any insights on whether one of these options may be better than the other and why? Have you ever used UUIDs in MongoDB instead of ObjectIDs and if yes what were the advantages / problems you came across?

874

asked Mar 06 '15 08:03

Christina

2 Answers

Using UUIDs in Mongo is certainly possible and reasonably well supported. For example, the Mongo docs list UUIDs as one of the common options for the _id field.

Considerations

Performance – As other answers mention, benchmarks show UUIDs cause a performance drop for inserts. In the worst case measured (going from 10M to 20M docs in a collection) they've about ~2-3x slower – the difference between inserting 2,000 (UUID) and 7,500 (ObjectID) docs per second. This is a large difference but its significance depends entirely on you use case. Will you be batch inserting millions of docs at a time? For most apps I've build the common case is inserting individual documents. The same benchmarks show that, for that usage pattern, the difference is much smaller (6,250 -vs- 7,500; ~20%). Not insignificant.. but not earth shattering either.
Portability – Many other DB platforms have good UUID support so portability would be improved. Alternatively, since UUIDs are larger (more bits) it is possible to repack an ObjectID into the "shape" of a UUID. This approach isn't as nice as direct portability but it does give you a way to "map" between existing ObjectIDs and UUIDs.
Decentralisation – One of the big selling points of UUIDs is that they're universally unique. This makes it practical to generate them anywhere, in a decentralised fashion (in contrast to, for example an auto-incrementing value, that requires a centralised source of truth to determine the "next" value). Of course, Mongo Object IDs profess this benefit too. The difference is, UUIDs are based on a 15+ year old standard and supported on (nearly?) all platforms, languages, etc. This makes them very useful if you ever need to create entities (or specifically, sets of related entities) in disjointed systems, without interacting with the database. You can create a dataset with IDs and foreign keys in place, then write the whole graph into the database at some point in the future without conflict. Although this is also possible with Mongo ObjectIDs, finding code to generate them/work with the format will often be harder.

Corrections

Contrary to some of the other answers:

UUIDs do have native Mongo support – You can use the UUID() function in the Mongo Shell exactly the same way you'd use ObjectID(); to convert a UUID string into equivalent BSON object.
UUIDs are not especially large – When encoded using binary subtype 0x04 they're 128 bits, compared to 96 bits for ObjectIDs. (If encoded as strings they will be pretty wasteful, taking around 288 bits.)
UUIDs can include a timestamp – Specifically, UUIDv1 encodes a timestamp with 60 bits of precision, compared to 32 bits in ObjectIDs. This is over 6 orders of magnitude more precision, so nano-seconds instead of seconds. It can actually be a decent way of storing create timestamps with more accuracy than Mongo/JS Date objects support, however...
- The build in UUID() function only generates v4 (random) UUIDs so, to leverage this this, you'd to lean on on your app or Mongo driver for ID creation.
- Unlike ObjectIDs, because of the way UUIDs are chunked, the timestamp doesn't give you a natural order. This can be good or bad depending on your use case. (New standards may change this; see 2021 update below.)
- Including timestamps in your IDs is sometimes a Bad Idea. You end up leaking the created time of documents anywhere an ID is exposed. (Of course ObjectIDs also encode a timestamp so this is partly true for them too.)
- If you do this with (spec compliant) v1 UUIDs, you're also encoding part of the servers MAC address, which can potentially be used to identify the machine. Probably not an issue for most systems but also not ideal. (New standards may change this; see 2021 update below.)

Conclusion

If you think about your Mongo DB in isolation, ObjectIDs are the obvious choice. They work well out of the box and are a perfectly capable default. Using UUIDs instead does add some friction, both when working with the values (needing to convert to binary types, etc.) and in terms of performance. Whether this slight inconvenience is worth having a standardised ID format really depends on the importance you place on portability and your architectural choices.

Will you be syncing data between different database platforms? Will you migrate your data to a different platform in the future? Do you need to generate IDs outside the database, in other systems or in the browser? If not now at some point in the future? UUIDs might be worth the hassle.

Aug 2021 Update

The IEFT recently published a draft update to the UUID spec that would introduce some new versions of the format.

Specifically, UUIDv6 and UUIDv7 are based on UUIDv1 but flip the timestamp chunks so the bits are arranged from most significant to least significant. This gives the resultant values a natural order that (more or less) reflects the order in which they were created. The new versions also exclude data derived from the servers MAC address, addressing a long-standing criticism of v1 UUIDs.

It'll take time for these changes to flow though to implementations but (IMHO) they significantly modernise and improve the format.

142

answered Oct 03 '22 12:10

Molomby

The _id field of MongoDB can have any value you want as long as you can guarantee that it is unique for the collection. When your data already has a natural key, there is no reason not to use this in place of the auto-generated ObjectIDs.

ObjectIDs are provided as a reasonable default solution to safe time generating an own unique key (and to discourage beginners from trying to copy SQL's AUTO INCREMENT which is a bad idea in a distributed database).

By not using ObjectIDs you also miss out on another convenience feature: An ObjectID also includes an unix timestamp when it was generated, and many drivers provide a funtion to extract it and convert it to a date. This can sometimes make a separate create-date field redundant.

But when neither is a concern for you, you are free to use your UUIDs as _id field.

answered Oct 03 '22 14:10

Philipp

Related questions
                            
                                What are the advantages of using a schema-free database like MongoDB compared to a relational database?
                            
                                MongoDB Schema Design - Many small documents or fewer large documents?
                            
                                How can I wait for a docker container to be up and running?
                            
                                Failed to start mongod.service: Unit mongod.service not found
                            
                                Mongoose limit/offset and count query
                            
                                How to implement has_many :through relationships with Mongoid and mongodb?
                            
                                String field value length in mongoDB
                            
                                Best way to perform a full text search in MongoDB and Mongoose
                            
                                MongoDB with redis
                            
                                How To Create Mongoose Schema with Array of Object IDs?
                            
                                search by ObjectId in mongodb with pymongo
                            
                                MongoDB: How do I update a single subelement in an array, referenced by the index within the array?
                            
                                Mongo interface [closed]
                            
                                How to avoid transparent_hugepage/defrag warning from mongodb?
                            
                                How to protect the password field in Mongoose/MongoDB so it won't return in a query when I populate collections?
                            
                                Why does Mongoose have both schemas and models?
                            
                                How does column-oriented NoSQL differ from document-oriented?
                            
                                In mongoDb, how do you remove an array element by its index?
                            
                                Creating Multifield Indexes in Mongoose / MongoDB
                            
                                Mongoose.js: Find user by username LIKE value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With