Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB embedded vs array sub document performance

Given the below competing schemas with up to 100,000 friends I’m interested in finding the most efficient for my needs.

Doc1 (Index on user_id)

{
"_id" : "…",
"user_id" : "1",
friends : {
    "2" : {
        "id" : "2",
        "mutuals" : 3
    }
     "3" : {
         "id" : "3",
         "mutuals": "1"
    }
   "4" : {
         "id" : "4",
         "mutuals": "5"
    }
}
}

Doc2 (Compound multi key index on user_id & friends.id)

{
"_id" : "…",
"user_id" : "1",
friends : [
   {
        "id" : "2",
        "mutuals" : 3
    },
    {
         "id" : "3",
         "mutuals": "1"
    },
   {
         "id" : "4",
         "mutuals": "5"
    }
]}

I can’t seem to find any information on the efficiency of the sub field retrieval. I know that mongo implements data internally as BSON, so I’m wondering whether that means a projection lookup is a binary O(log n)?

Specifically, given a user_id to find whether a friend with friend_id exists, how would the two different queries on each schema compare? (Assuming the above indexes) Note that it doesn’t really matter what’s returned, only that not null is returned if the friend exists.

Doc1col.find({user_id : "…"}, {"friends.friend_id"})
Doc2col.find({user_id : "…", "friends.id" : "friend_id"}, {"_id":1})

Also of interest is how the $set modifier works. For schema 1,given the query Doc1col.update({user_id : "…"}, {"$set" : {"friends.friend_id.mutuals" : 5}), how does the lookup on the friends.friend_id work? Is this a O(log n) operation (where n is the number of friends)?

For schema 2, how would the query Doc2col.update({user_id : "…", "friends.id" : "friend_id"}, {"$set": {"friends.$.mutuals" : 5}) compare to that of the above?

like image 696
Nelson Shaw Avatar asked Nov 30 '12 02:11

Nelson Shaw


People also ask

When should we embedded a document with another in MongoDB?

An embedded, or nested, MongoDB Document is a normal document that's nested inside another document within a MongoDB collection. Embedded documents are particularly useful when a one-to-many relationship exists between documents.

What is embedded document in MongoDB?

MongoDB provides you a cool feature which is known as Embedded or Nested Document. Embedded document or nested documents are those types of documents which contain a document inside another document.

Can we store array of objects in MongoDB?

One of the benefits of MongoDB's rich schema model is the ability to store arrays as document field values. Storing arrays as field values allows you to model one-to-many or many-to-many relationships in a single document, instead of across separate collections as you might in a relational database.

How can you implement 1 to many relationships in MongoDB?

In MongoDB, one-to-one, one-to-many, and many-to-many relations can be implemented in two ways: Using embedded documents. Using the reference of documents of another collection.


1 Answers

doc1 is preferable if one's primary requirements is to present data to the ui in a nice manageable package. its simple to filter only the desired data using a projection {}, {friends.2 : 1}

doc2 is your strongest match since your use case does not care about the result Note that it doesn’t really matter what’s returned and indexing will speed up the fetch.

on top of that doc2 permits the much cleaner syntax

db.doc2.findOne({user_id: 1, friends.id : 2} )

versus

db.doc1.findOne({ $and : [{ user_id: 1 }, { "friends.2" : {$exists: true} }] })

on a final note, however, one can create a sparse index on doc1 (and use $exists) but your possibility of 100,000 friends -- each friend needed a sparse index -- makes that absurd. opposed to a reasonable number of entries say demographics gender [male,female], agegroups [0-10,11-16,25-30,..] or more impt things [gin, whisky, vodka, ... ]

like image 94
Gabe Rainbow Avatar answered Oct 23 '22 09:10

Gabe Rainbow