Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB embedded vs. reference from performance perspective

I read that embedding is better from a performance point of view: "If performance is an issue, embed." (http://www.mongodb.org/display/DOCS/Schema+Design) and most guides always say contains should be embedded.

However I am not sure this is the case. Suppose we have two objects: Blog and Post. Blog contains posts.

Now making all posts embedded in blog will have the following issues:

  1. Paging. Since it's not possible to filter embedded objects, we will always get all posts and need to filter them out in the application.
  2. Filtering. Same as before, when searching for word inside posts, it will not be possible to filter the embedded collection from MongoDB.
  3. Insert. I assume inserting to collection is faster than inserting to embedded object. Is this correct? this is written anywhere?
  4. Update. Same as before, inline updating field inside smaller document (Post) might be faster then inline updating the post inside big document of Blog. Is this correct?

Taking all of the above, I would go for having posts in a separate collection referencing Blog. Is this the correct conclusion?

(Note: Please do not factor document size limit in the response, let's assume each blog will have at most 1000 posts)

like image 913
mbdev Avatar asked Jun 14 '11 14:06

mbdev


2 Answers

As for 3 & 4, if you are inserting into a nested document, it is basically an update.

This can be terribly bad for your performance because inserts are generally appended to the end of the data which works fine and fast. Updates, on the other hand, can be much trickier.

If your update does not change the size of a document (meaning that you had a key\value pair and simply changed the value to a new value that takes up the same amount of space) then you will be ok but when you start modifying documents and adding new data, a problem arises.

The problem is that while MongoDB allots more space than it needs for each document, it may not be enough. If you insert a document that is 1k large, MongoDB may allot 1.5k for the document to ensure that minor changes to the document have enough space to grow. If you use more than the allocated space, MongoDB has to fetch the entire document and re-write it at the tail end of the data.

There is obviously a performance implication in fetching and re-writing the data which will be amplified by the frequency of such an operation. To make matters worse, when this happens you end up leaving holes or pockets of unused space in your data files.

This ultimately gets copied into memory which means that you may end up using 2GB of RAM to store your data set, while in reality the data itself only takes up 1.5GB because there are .5GB worth of pockets. This fragmentation can be avoided by doing inserts as opposed to updates. It can also be fixed by doing a database repair.

In the next version of MongoDB there will be an online compaction function.

like image 37
Bryan Migliorisi Avatar answered Oct 04 '22 05:10

Bryan Migliorisi


1.Paging possible with $slice operator:

db.blogs.find({}, {posts:{$slice: [10, 10]}}) // skip 10, limit 10

2.Filtering also possible:

db.blogs.find({"posts.title":"Mongodb!"}, {posts:{$slice: 1}}) //take one post

3,4. Generally i guess you are speaking about small performance difference. It's not rocket science, it just blog with at most 1000 posts.

You said:

Is this the correct conclusion?

No, if you care about performance (in general if system will be small you can go with separate document).

I've done small performance test regarding 3,4, here is results:

-----------------------------------------------------------------
| Count/Time |  Inserting posts   | Adding to nested collection |
-------------|--------------------------------------------------               
|   1        |   1 ms             |  28 ms                      |
|   1000     |   81 ms            |  590 ms                     |
|   10000    |   759 ms           |  2723 ms                    |
 ---------------------------------------------------------------
like image 151
Andrew Orsich Avatar answered Oct 04 '22 04:10

Andrew Orsich