Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In MongoDB is it practical to keep all comments for a post in one document?

Tags:

mongodb

I've read in description of Document based dbs you can for example embed all comments under a post in the same document as the post if you choose to like so:

{
   _id = sdfdsfdfdsf,
   title = "post title"
   body = "post body"
   comments = [
      "comment 1 ......................................... end of comment"
           .
           .
           n
   ]
}

I'm having situation similar where each comment could be as large as 8KB and there could be as many as 30 of them per post.

Even though it's convenient to embed comments in the same document I wonder if having large documents impact performance especially when MongoDb server and http server run on separate machines and must communicate though a LAN?

like image 796
Roman Avatar asked Jun 18 '12 05:06

Roman


People also ask

Should I normalize data before storing MongoDB?

Should I normalize my data before storing it in MongoDB? No. Schema design is very important when using MongoDB, but very different from schema design for relational databases.

What method can we use to maximize performance and prevent MongoDB from returning more results than required for processing?

Use limit() to maximize performance and prevent MongoDB from returning more results than required for processing.


4 Answers

Posting this answer after some the others so I will repeat some of the things mentioned.

That said there are a few things to take into account. Consider these three questions :

  1. Will you always require all comments every time you query for a post?
  2. Will you want to query on comments directly (e.g. query comments for a specific user)?
  3. Will your system have relatively low usage?

If all questions can be answered with yes then you can embed the comments array. In all other scenarios you will probably need a seperate collection to store your comments.

First of all, you can actually update and remove comments atomically in a concurrency safe way (see updates with positional operators) but there are some things you cannot do such as index based inserts.

The main concern with using embedded arrays for any sort of large collection is the move-on-update issue. MongoDB reserves a certain amount of padding (see db.col.stats().paddingFactor) per document to allow it to grow as needed. If it runs out of this padding (and it will often in your usecase) it will have to move that ever growing document around on the disk. This makes updates an order of magnitude slower and is therefore a serious concern on high bandwidth servers. A related but slightly less vital issue is bandwidth. If you have no choice but to query the entire post with all its comments even though you're only displaying the first 10 you're going to waste quite a bit of bandwidth which can be an issue on cloud environments especially (you can use $slice to avoid some of this).

If you do want to go embedded here are your basic ops :

Add comment :

db.posts.update({_id:[POST ID]}, {$push:{comments:{commentId:"remon-923982", author:"Remon", text:"Hi!"}}})

Update comment :

 db.posts.update({_id:[POST ID], 'comments.commentId':"remon-923982"}, {$set:{'comments.$.text':"Hello!"}})

Remove comment

db.posts.update({_id:[POST ID], 'comments.commentId':"remon-923982"}, {$pull:{comments:{commentId:"remon-923982"}}})

All these methods are concurrency safe because the update criteria are part of the (process wide) write lock.

With all that said you probably want a dedicated collection for your comments but that comes with a second choice. You can either store each comment in a dedicated document or use comment buckets of, say, 20-30 comments each (described in detail here http://www.10gen.com/presentations/mongosf2011/schemascale). This has advantages and disadvantages so it's up to you to see which approach fits best for what you want to do. I would go for buckets if your comments per post can exceed a couple of hundred due to the o(N) performance of the skip(N) cursor method you'll need for paging them. In all other cases just go with a comment per document approach. That's most flexible with querying on comments for other use cases as well.

like image 72
Remon van Vliet Avatar answered Oct 10 '22 13:10

Remon van Vliet


It greatly depends on the operations you want to allow, but a separate collection is usually better.

For instance, if you want to allow users to edit or delete comments, it is a very good idea to store comments in a separate collection, because these operations are hard or impossible to express w/ atomic modifiers alone, and state management becomes painful. The documentation also covers this.

A key issue w/ embedding comments is that you will have different writers. Normally, a blog post can be modified only by blog authors. With embedded comments, a reader also gets write access to the object, so to speak.

Code like this will be dangerous:

post = db.findArticle( { "_id" : 2332 } );
post.Text = "foo";
// in this moment, someone does a $push on the article's comments
db.update(post);
// now, we've deleted that comment
like image 30
mnemosyn Avatar answered Oct 10 '22 12:10

mnemosyn


For performance reasons it is best to avoid documents that can grow in size over time:

Padding Factors:

"When you update a document in MongoDB, the update occurs in-place if the document has not grown in size. If the document did grow in size, however, then it might need to be relocated on disk to find a new disk location with enough contiguous space to fit the new larger document. This can lead to problems for write performance if the collection has many indexes since a move will require updating all the indexes for the document."

http://www.mongodb.org/display/DOCS/Padding+Factor

like image 39
snoopdave Avatar answered Oct 10 '22 13:10

snoopdave


If you always retrieve a post with all its comments, why not?

If you don't, or you retrieve comments in a query other than by post (ie. view all of a user's comments on the user's page), then probably not since queries would become much more complicated.

like image 26
Jonathan Ong Avatar answered Oct 10 '22 12:10

Jonathan Ong