Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSON Schema with dynamic key field in MongoDB

Want to have a i18n support for objects stored in mongodb collection

currently our schema is like:

{
  _id: "id"
  name: "name"
  localization: [{
    lan: "en-US",
    name: "name_in_english"
  }, {
    lan: "zh-TW",
    name: "name_in_traditional_chinese"
  }]
}

but my thought is that field "lan" is unique, can I just use this field as a key, so the structure would be

{
  _id: "id"
  name: "name"
  localization: {
    "en-US": "name_in_english",
    "zh-TW": "name_in_traditional_chinese"
  }
}

which would be neater and easier to parse (just localization[language] would get the value i want for specific language).

But then the question is: Is this a good practice in storing data in MongoDB? And how to pass the json-schema check?

like image 332
fin Avatar asked Jul 26 '13 09:07

fin


2 Answers

It is not a good practice to have values as keys. The language codes are values and as you say you can not validate them against a schema. It makes querying against it impossible. For example, you can't figure out if you have a language translation for "nl-NL" as you can't compare against keys and neither is it possible to easily index this. You should always have descriptive keys.

However, as you say, having the languages as keys makes it a lot easier to pull the data out as you can just access it by ['nl-NL'] (or whatever your language's syntax is).

I would suggest an alternative schema:

{
    your_id: "id_for_name"
    lan: "en-US",
    name: "name_in_english"
}
{
    your_id: "id_for_name"
    lan: "zh-TW",
    name: "name_in_traditional_chinese"
}

Now you can :

  • set an index on { your_id: 1, lan: 1 } for speedy lookups
  • query for each translation individually and just get that translation:
    db.so.find( { your_id: "id_for_name", lan: 'en-US' } )
  • query for all the versions for each id using this same index:
    db.so.find( { your_id: "id_for_name" } )
  • and also much easier update the translation for a specific language:

    db.so.update(
        { your_id: "id_for_name", lan: 'en-US' }, 
        { $set: { name: "ooga" } } 
    )
    

Neither of those points are possible with your suggested schemas.

like image 80
Derick Avatar answered Oct 01 '22 21:10

Derick


Obviously the second schema example is much better for your task (of course, if lan field is unique as you mentioned, that seems true to me also).

Getting element from dictionary/associated array/mapping/whatever_it_is_called_in_your_language is much cheaper than scanning whole array of values (and in current case it's also much efficient from the storage size point of view (remember that all fields are stored in MongoDB as-is, so every record holds the whole key name for json field, not it's representation or index or whatever).

My experience shows that MongoDB is mature enough to be used as a main storage for your application, even on high-loads (whatever it means ;) ), and the main problem is how you fight database-level locks (well, we'll wait for promised table-level locks, it'll fasten MongoDB I hope a lot more), though data loss is possible if your MongoDB cluster is built badly (dig into docs and articles over Internet for more information).

As for schema check, you must do it by means of your programming language on application side before inserting records, yeah, that's why Mongo is called schemaless.

like image 28
Rostyslav Dzinko Avatar answered Oct 01 '22 21:10

Rostyslav Dzinko