I am using java/morphia to deal with mongodb. The default ObjectId is not very convenient to use from Java layer. I would like to make it a String type while keep the key generation process using ObjectId, say _id = new ObjectId.toString()
.
I want to know if there is any side effects doing it this way? For example, will it impact the database performance or causing key conflicts in any means? Will it affect the sharding environment ...
In a modern database, such as MongoDB, we need a unique identifier in an _id field as a primary key as well. MongoDB provides an automatic unique identifier for the _id field in the form of an ObjectId data type. datatype is automatically generated as a unique document identifier if no other identifier is provided.
MongoDB stores data in BSON format both internally, and over the network, but that doesn't mean you can't think of MongoDB as a JSON database.
@KevinMeredith As specified here, yes, an _id field is mandatory. «In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key. If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId for the _id field».
By default, MongoDB generates a unique ObjectID identifier that is assigned to the _id field in a new document before writing that document to the database. In many cases the default unique identifiers assigned by MongoDB will meet application requirements.
You can use any type of value for an _id
field (except for Arrays). If you choose not to use ObjectId, you'll have to somehow guarantee uniqueness of values (casting ObjectId to string will do). If you try to insert duplicate key, error will occur and you'll have to deal with it.
I'm not sure what effect will it have on sharded cluster when you attempt to insert two documents with the same _id to different shards. I suspect that it will let you insert, but this will bite you later. (I'll have to test this).
That said, you should have no troubles with _id = (new ObjectId).toString()
.
I actually did the same thing because I was having some problem converting the ObjectId to JSON.
I then did something like
@Id private String id; public String getId() { return id(); } public void setId(String id) { this.id = id; }
And everything worked fine untill I decided to update a previously inserted document, when i got the object by Id sent it to the page via JSON and receive the same updated object also by JSON post and then used the save function from the Datastore, instead of updating the previous data it inserted a new document instead of updating the one that was already.
Even worst the new document had the same ID than the previously inserted one, something i thought was impossible.
Anyway i setted the private object as an ObjectID and just left the get set as string and then it worked as expected, not sure that helps in your case thought.
@Id private ObjectId id; public String getId() { return id.toString(); } public void setId(String id) { this.id = new ObjectId(id); }
Yes, you can use a string as your _id.
I'd recommend it only if you have some value (in the document) that naturally is a good unique key. I used this design in one collection where there was a string geo-tag, of the form "xxxxyyyy"; this unique-per-document field was going to HAVE to be in the document and I had to build an index on it... so why not use it as a key? (This avoided one extra key-value pair, AND avoided a second index on the collection since MongoDB naturally builds an index on "_id". Given the size of the collection, both of these added up to some serious space savings.)
However, from the tone of your question ("ObjectIDs are not very convenient"), if the only reason you want to use a string is you don't want to be bothered with figuring out how to neatly manage ObjectIDs... I'd suggest it is worth your time to get your head around them. I'm sure they are no trouble... once you've figured out your trouble with them.
Otherwise: what are your options? Will you concoct string IDs EVERY TIME you use MongoDB in the future?
I would like to add that it is not always a good idea to use the automatically generated BSON ObjectID as a unique identifier, if it gets passed to the application: it can potentially be manipulated by the user.
ObjectIDs appear to be generated sequentially, so if you fail to implement the necessary authorization mechanisms, malicious user could simply increment the value he has, to access resources he should not have access to.
UPDATE: Since version 3.4+ the ObjectIDs are no longer generated incrementally. Please see 3.2 docs vs the latest docs
Therefore using UUID type identifiers will provide a layer of security-through-obscurity. Of course, Authorization (is this user allowed to access requested resource) is a must, but you should be aware of the aforementioned ObjectID feature.
To get the best of both worlds, generate UUID which matches your ObjectID length (12 or 24 characters) and use it to create your own _id of ObjectID type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With