Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does performing a partial update on a MongoDb document in WiredTiger provide any advantage over a full document update?

I'm using a Java driver, although this question is not language specific, to write partial updates to mongodb documents because using the MMAPv1 storage engine the documents are edited in place (in memory) so this provides better performance. This does add considerable development complexity as I could alternatively save the entire document at once and not worry about the details of what exactly got updated. After updating to WiredTiger I learned that this newer storage engine does not edit documents in place (in memory) but instead allocates new memory for each write (unclear if this means full copy of the document or just diff). Does this mean that it makes no performance difference whether I do a full document write vs a partial one?

like image 946
nofunatall Avatar asked Jun 17 '15 04:06

nofunatall


1 Answers

After updating to WiredTiger I learned that this newer storage engine does not edit documents in place (in memory) but instead allocates new memory for each write (unclear if this means full copy of the document or just diff).

WiredTiger uses Multiversion Concurrency Control (MVCC) to maintain multiple views of data for the lifetime of readers. WiredTiger’s in-memory format is different from the on-disk format: in-memory it stores diffs to a document, but a full version of the document is constructed when flushed to the data files as part of periodic checkpoints.

Does this mean that it makes no performance difference whether I do a full document write vs a partial one?

Irrespective of how different MongoDB storage engines handle persisting updates to disk, there are still performance benefits in using partial updates rather than full updates where possible (particularly if you are setting field values which are small relative to overall document size).

For example, consider:

  • Network traffic for document updates (any storage engine)
  • Size of entries in the journal (any storage engine)
  • Size of entries in the replication oplog (any storage engine)
  • Size of in-memory versions of updates (WiredTiger)

If you are sending full document updates each time, you also create scenarios where the order that updates reach the server is significant even when changes might be for distinct field sets. You could add additional application logic such as optimistic versioning to ensure you don't accidentally overwrite field values, but this may add unnecessary complexity depending on your use case.

like image 64
Stennie Avatar answered Sep 20 '22 03:09

Stennie