Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For Mongodb is it better to reference an object or use a natural String key?

Tags:

mongodb

I am building a corpus of indexed sentences in different languages. I have a collection of Languages which have both an ObjectId and the ISO code as a key. Is it better to use a reference to the Language collection or store a key like "en" or "fr"?

I suppose it's a compromise between:

  • ease of referencing the Language
  • object in that collection
  • speed in doing queries where the sentence has a certain language
  • the size of the data on disk

Any best practices that I should know of?

like image 308
Nic Cottrell Avatar asked May 18 '11 14:05

Nic Cottrell


People also ask

Can I use a string ID in MongoDB?

Yes, you can. BTW, uniqueness guaranteed by mongodb. Because _id field has a unique index by default.

Do we have primary key in MongoDB?

A field required in every MongoDB document. The _id field must have a unique value. You can think of the _id field as the document's primary key.

What is the use of _id in MongoDB?

The _id Field In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key. If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId for the _id field.

What is an object in MongoDB?

An embedded object is a special type of Realm object that models complex data. They also map more naturally to the MongoDB document model. Embedded objects are similar to relationships, but provide additional constraints.


2 Answers

In the end, it really comes down to personal choice and what will work best for your application.

The only requirement that MongoDB imposes upon _id is that it be unique. It can be an ObjectId (which is provided by default), a string, even an embedded document (As I recall it cannot be an Array though).

In this case, you can likely guarantee ISO Code is a unique value and it may be an ideal value. You have a 'known' primary key which is also useful in itself by being identifiable, so using that instead of a generated ID is probably a more sensible bet. It also means anywhere you 'reference' this information in another collection you can save the ISO Code instead of an Object ID; those browsing your raw data can immediately identify what information that reference points at.

As an aside:

The two big benefit of ObjectId is that they can be generated uniquely across multiple machines, processes and threads without needing any kind of central sequence tracking by the MongoDB server. They also are stored as a special type in MongoDB that only uses 12 bytes (as opposed to the 24 byte representation of the string version of an ObjectID)

like image 108
Brendan W. McAdams Avatar answered Oct 16 '22 06:10

Brendan W. McAdams


Unless disk space is an issue, I'd probably go with the language key like "en" or "fr". This way it saves doing an additional query on the Languages collection to find the ObjectId key for a given language, you can just query the sentences directly:

db.sentences.find( { lang: "en" } )

So long as the lang field is indexed - db.sentences.ensureIndex( { lang: 1 } ) - I don't think there'll be much difference in query performance.

If you've got a humongous data set, and disk space is a concern, then you could consider an ObjectId (12 bytes), or a number (8 bytes), which might be smaller than a UTF-8 string key depending on its length.

like image 25
Chris Fulstow Avatar answered Oct 16 '22 07:10

Chris Fulstow