I'm trying to understand the behavior of reads in a mongodb replica set. In particular I have an environment with high rate of reads, low rate of writes, and relatively small data set (say less than 8GB). I have a 3 node replica set. I read this document: http://docs.mongodb.org/manual/core/read-preference/ In particular: <blockquote> <blockquote> primary Default mode. All operations read from the current replica set primary. primaryPreferred In most situations, operations read from the primary but if it is unavailable, operations read from secondary members. secondary All operations read from the secondary members of the replica set. secondaryPreferred In most situations, operations read from secondary members but if no secondary members are available, operations read from the primary. nearest Operations read from the nearest member of the replica set, irrespective of the member’s type. </blockquote> </blockquote> So my understanding is that reads by default go to the primary. There are read preferences that allow reading from secondary (<code>secondary</code>, and <code>secondaryPreferred</code>). In these cases stale data may be served. It seems to me that it would be preferable to distribute the reads across both primary and secondary machines, so that I can make best use off all 3 machines. But I don't really see this as an option. The following statement in particular perplexes me: <blockquote> <blockquote> If read operations account for a large percentage of your application’s traffic, distributing reads to secondary members can improve read throughput. However, in most cases sharding provides better support for larger scale operations, as clusters can distribute read and write operations across a group of machines. </blockquote> </blockquote> However, in the case of a relatively small data set, sharding simply doesn't make sense. Can someone shed some light on the right configuration?

I had asked this question to in a forum of MongoDB, this was their answer, TL;DR: Use nearest. Indeed, sharding your cluster would definitely solve the problem as it would force you to split your data set into pieces (shards) and your reads and writes operations would be distributed evenly by the mongos servers - granted that you chose a good shard key. But, as you found out, it doesn’t really makes sense for a relatively little data set and it would be more costly. Our documentation doesn’t really reveal all the magic behind the “nearest” option, but there is actually a round-robin algorithm implemented behind it. In our specifications, you can read more about it - especially about the options that you can set to tune the round-robin algorithm. https://github.com/mongodb/specifications/blob/master/source/server-selection/server-selection.rst#nearest https://github.com/mongodb/specifications/blob/master/source/driver-read-preferences.rst#nearest <pre class="prettyprint"><code>To distribute reads across all members evenly regardless of RTT, users should use mode ‘nearest’ without tag_sets or maxStalenessSeconds and set localThresholdMS very high so that all servers fall within the latency window. </code></pre> Here is more documentation about the ping times. Especially this part: <pre class="prettyprint"><code>Once the driver or mongos has found a list of candidate members based on mode and tag sets, determine the “nearest” member as the one with the quickest response to a periodic ping command. (The driver already knows ping times to all members, see “assumptions” above.) Choose a member randomly from those at most 15ms “farther” than the nearest member. The 15ms cutoff is overridable with “secondaryAcceptableLatencyMS”. </code></pre> Also, the more RAM you have, the less documents will need to be retrieved from disk. If your working set 1 is large, you should considered adding some RAM to reduce the IOPS and overall latency. Adding to what has mentioned above, the terminology is inconsistent in places but in https://www.mongodb.com/blog/post/server-selection-next-generation-mongodb-drivers it clarifies that <pre class="prettyprint"><code>The ‘localThresholdMS’ variable used to be called secondaryAcceptableLatencyMS, but was renamed for more consistency with mongos (which already had localThreshold as a configuration option) and because it no longer applies only to secondaries. </code></pre> The <code>localThresholdMS</code> is a connection string used as part of uri option, docs can be found at https://mongodb.github.io/mongo-java-driver/4.1/apidocs/mongodb-driver-core/com/mongodb/ConnectionString.html

distribute reads across replicas in mongodb

Tags:

mongodb

I'm trying to understand the behavior of reads in a mongodb replica set. In particular I have an environment with high rate of reads, low rate of writes, and relatively small data set (say less than 8GB). I have a 3 node replica set.

I read this document:

http://docs.mongodb.org/manual/core/read-preference/

In particular:

primary Default mode. All operations read from the current replica set primary.

primaryPreferred In most situations, operations read from the primary but if it is unavailable, operations read from secondary members.

secondary All operations read from the secondary members of the replica set.

secondaryPreferred In most situations, operations read from secondary members but if no secondary members are available, operations read from the primary.

nearest Operations read from the nearest member of the replica set, irrespective of the member’s type.

So my understanding is that reads by default go to the primary. There are read preferences that allow reading from secondary (secondary, and secondaryPreferred). In these cases stale data may be served.

It seems to me that it would be preferable to distribute the reads across both primary and secondary machines, so that I can make best use off all 3 machines. But I don't really see this as an option. The following statement in particular perplexes me:

If read operations account for a large percentage of your application’s traffic, distributing reads to secondary members can improve read throughput. However, in most cases sharding provides better support for larger scale operations, as clusters can distribute read and write operations across a group of machines.

However, in the case of a relatively small data set, sharding simply doesn't make sense. Can someone shed some light on the right configuration?

440

asked Jan 16 '14 15:01

Kevin

1 Answers

I had asked this question to in a forum of MongoDB, this was their answer,

TL;DR: Use nearest.

Indeed, sharding your cluster would definitely solve the problem as it would force you to split your data set into pieces (shards) and your reads and writes operations would be distributed evenly by the mongos servers - granted that you chose a good shard key. But, as you found out, it doesn’t really makes sense for a relatively little data set and it would be more costly.

Our documentation doesn’t really reveal all the magic behind the “nearest” option, but there is actually a round-robin algorithm implemented behind it. In our specifications, you can read more about it - especially about the options that you can set to tune the round-robin algorithm.

https://github.com/mongodb/specifications/blob/master/source/server-selection/server-selection.rst#nearest https://github.com/mongodb/specifications/blob/master/source/driver-read-preferences.rst#nearest

To distribute reads across all members evenly regardless of RTT, users should use mode ‘nearest’ without tag_sets or maxStalenessSeconds and set localThresholdMS very high so that all servers fall within the latency window.

Here is more documentation about the ping times.

Especially this part:

Once the driver or mongos has found a list of candidate members based on mode and tag sets, determine the “nearest” member as the one with the quickest response to a periodic ping command. (The driver already knows ping times to all members, see “assumptions” above.) Choose a member randomly from those at most 15ms “farther” than the nearest member. The 15ms cutoff is overridable with “secondaryAcceptableLatencyMS”.

Also, the more RAM you have, the less documents will need to be retrieved from disk. If your working set 1 is large, you should considered adding some RAM to reduce the IOPS and overall latency.

Adding to what has mentioned above, the terminology is inconsistent in places but in https://www.mongodb.com/blog/post/server-selection-next-generation-mongodb-drivers it clarifies that

The ‘localThresholdMS’ variable used to be called secondaryAcceptableLatencyMS, but was renamed for more consistency with mongos (which already had localThreshold as a configuration option) and because it no longer applies only to secondaries.

The localThresholdMS is a connection string used as part of uri option, docs can be found at https://mongodb.github.io/mongo-java-driver/4.1/apidocs/mongodb-driver-core/com/mongodb/ConnectionString.html

answered Oct 21 '22 00:10

Captain Levi

Related questions
                            
                                Mongoose doesn't save data to the MongoDB
                            
                                Many to Many with Mongoose
                            
                                Mongoose greater than and equal
                            
                                MongoDB C# - Getting BsonDocument for an Element that doesn't exist
                            
                                Meteor iterate list in template
                            
                                "pre" and "post" remove Middleware not firing
                            
                                Nodejs/mongodb- Checking if a user has Admin privileges (token based auth)
                            
                                How to rename a document field in a MongoDB? [duplicate]
                            
                                MongoDB error as setup Wizard ended prematurely, while installing it on windows 10
                            
                                mongodb $exists always returning 0
                            
                                mongoexport error: Failed: Failed to parse + Unrecognized field 'snapshot
                            
                                mongoDB alternatives for foreign key constraints
                            
                                saving picture to mongodb
                            
                                Removing white spaces (leading and trailing) from string value
                            
                                Command to reindex all mongodb collections
                            
                                Mongoose Activity model with dynamic reference
                            
                                How does MonogoDB stack up for very large data sets where only some of the data is volatile
                            
                                How to download Mongodb 3.0? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With