Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to choose from MMAPV1, WiredTiger or In-Memory StorageEngine for MongoDB?

Tags:

In MongoDb Documentation 3.2 I saw that they support 3 Storage Engine, MMAPV1, WiredTiger, In-Memory, it is very confusing which one to choose.

I have the sensation from the description that WiredTiger is better than MMAPV1, but in other sources they say that MMAPV1 is better for heavy reads... and WiredTiger for heavy writes...

Is there some constraints when to choose one over the other ? Can someone suggest some best practices for example

when I have this type of application usually is best this , else choose an other...

like image 224
Alexandru Olaru Avatar asked Jun 23 '16 07:06

Alexandru Olaru


People also ask

What is WiredTiger storage engine in MongoDB?

WiredTiger is the default storage engine starting in MongoDB 3.2. It is well-suited for most workloads and is recommended for new deployments. WiredTiger provides a document-level concurrency model, checkpointing, and compression, among other features. In MongoDB Enterprise, WiredTiger also supports Encryption at Rest.

Is MongoDB in-memory db?

MongoDB is not an in-memory database. Although it can be configured to run that way. But it makes liberal use of cache, meaning data records kept memory for fast retrieval, as opposed to on disk.

Which is the easy way to find the storage engine currently used in MongoDB db?

Easiest way to find the storage engine being used currently in from mongo console. WireTiger Storage engine is being used. to get all the configuration details of wiredTiger.


2 Answers

This is from personal experience, however please have a look at this blog entry it explains very well different types of the engines: Mongo Blog v3

Wired Tiger vs MMAPv1

Comparing the MongoDB WiredTiger and MMAPv1 storage engines.Higher Performance & Efficiency Between 7x and 10x Greater Write Performance MongoDB 3.0 provides more granular document-level concurrency control, delivering between 7x and 10x greater throughput for most write-intensive applications, while maintaining predictable low latency.

For me choice was very simple, I needed document level locks which makes WiredTiger ideal choice, we don't have Enterprise version of mongo hence in memory engine is not available. MMAPv1 Btree is very basic technique to map memory to hard drive and not very efficient.

The MMAP storage engine uses a process called “record allocation” to grab disk space for document storage. All records are contiguously located on disk, and when a document becomes larger than the allocated record, it must allocate a new record. New allocations require moving a document and updating all indexes that refer to the document, which takes more time than in-place updates and leads to storage fragmentation. Furthermore, MMAPv1 in it’s current iterations usually leads to high space utilization on your filesystem due to over-allocation of record space and it’s lack of support for compression. As mentioned previously, a storage engine’s locking scheme is one of the most important factors in overall database performance. MMAPv1 has collection-level locking – meaning only one insert, update or delete operation can use a collection at a time. This type of locking scheme creates a very common scenario in concurrent workloads, where update/delete/insert operations are always waiting for the operation(s) in front of them to complete. Furthermore, oftentimes those operations are flowing in more quickly than they can be completed in serial fashion by the storage engine. To put it in context, imagine a giant supermarket on Sunday afternoon that only has one checkout line open: plenty of customers, but low throughput!

Everyone has different requirements, but for most cases WiredTiger would be ideal choice the fact that it makes atomic operations on document level and not collection level has a great advantage, you simply can't beat that.

More reads and not a lot of writes

If reading is your main concern here is one way to address that.

You can tweak Mongo Driver Read Preference Modes in the following way:

  1. Setup Replica Set, say 1 master and 3 secondaries.
  2. Set write concern to majority this would make write a bit slower (trade off).
  3. Set read preference to secondary.

This setup will perform very well when you have a lot of reads, but as a tradeoff write would be slower. However throughput of read data would be great.

I hope this helps if you have additional questions add them as a comment and I will try to address it in this answer.

Also you can check MMAPv1 vs WiredTiger review and notice how he changed his mind from MMAPv1 to WiredTiger. The seller is document locking that performance you just can't beat.

For new projects, I use WiredTiger now. Since a migration from a compressed to an uncompressed WiredTiger storage is rather easy, I tend to start with compression to enhance the CPU utilization ("get more bang for the buck"). Should the compression have a noticeable impact on performance or UX, I migrate to uncompressed WiredTiger.

MongoDB database profiler

Best way of determining your database needs is to setup test cluster and run application on it with MongoDB profiler Like most database profilers, the MongoDB profiler can be configured to only write profile information about queries that took longer than a given threshold. So once you know slow queries you can figure out if it reads vs writes or cpu vs ram and go from there.

like image 122
Tim Avatar answered Sep 25 '22 13:09

Tim


You should use a replica set consisting of both in-memory and WiredTiger storage engines. And you should shard your MongoDB in such a way that the most frequented data should be accessed by the in-memory storage engine and rest uses WiredTiger storage engine.

After acquiring WiredTiger in 2014, MongoDB introduced this storage engine as their default storage engine from version 3.2. Thereafter, they themselves started to encourage users to use WiredTiger because of its following advantages over MMAPV1:

  • WiredTiger uses document level concurrency whereas MMAPV1 uses collection level locking. That means multiple users can write to a collection simultaneously using WiredTiger but not using MMAPV1.
  • Since WiredTiger manages its own memory, it can use compression whereas MMPAV1 doesn't have any such feature.
  • WiredTiger doesn't allow any in-place updates. Thus, eventually it reclaims the space that is no longer used.

Only advantages of MMPAV1 over WiredTiger which I have found so far is:

  • WiredTiger is not available on Solaris platform whereas MMPAV1 is.
  • Even while updating a big document with only a single element, WiredTiger re-writes the whole document making it slower.

So you can always left MMPAV1 out while choosing your storage engine. Now let's come to the point of in-memory storage engine. Starting in MongoDB Enterprise version 3.2.6, the in-memory storage engine is part of general availability (GA) in the 64-bit builds.

It has the following advantages over the storage engines:

  • Similar to WiredTiger, the in-memory storage engine also allow document level concurrency.
  • The in-memory storage engine is lot faster than others.

    By avoiding disk I/O, the in-memory storage engine allows for more predictable latency of database operations.

But this storage engine has quite a few disadvantages as well:

  • The in-memory storage engine does not persist data after process shutdown.

  • If your dataset is way too large, then in-memory engine is not a good option.

    In-memory storage engine requires that all its data (including oplog if mongod is part of replica set, etc.) fit into the specified --inMemorySizeGB command-line option or storage.inMemory.engineConfig.inMemorySizeGB setting.

Check the MongoDB Manual for example Deployment Architectures using in-memory storage engine.

like image 34
Arka Ghosh Avatar answered Sep 24 '22 13:09

Arka Ghosh