MongoDB Add Shard to Existing Cluster - What happens? [closed]

Tags:

mongodb

I am trying to make sure that I understand what happens when I add a new Shard (Replica Set) to an existing Shard Cluster. When I add these new members and it sees that there is a new Shard Member available Mongo then starts to re-arrange the chunks so that it can take advantage of the new members correct? What sort of impacts to you get when this happens? As always I would assume you want to try and add these members as soon as you start to see unfavorable performance numbers (If other tuning options are not helping).

Just wanted to get a better understanding of what happens when you add a Shard when a cluster already exists.

Thanks,

950

asked Jul 22 '12 13:07

scarpacci

2 Answers

When you add a shard to an existing cluster, it will automatically become the shard with the lowest number of chunks for every sharded collection. That means that it will be the default target for migrations (from the shard with the highest number of chunks) until things get more balanced. However, each shard primary (which is responsible for the migrations) can only take part in one migration at a time. As such, the balancing is going to take a while, especially if things are under load.

In terms of the migrations themselves, you are seeing them in your current cluster already, so that is how to judge their impact. You can view the recent migrations in the logs, or you can take a look at the changelog (a 10MB capped collection that contains the most recent migrations/splits etc.):

// connect to a mongos, switch to the config DB
use config
// look at the changelog
db.changelog.find()

In terms of what operations happen, well to move a chunk:

The documents that make up that chunk must be read into memory on the source shard if not already there (so a fairly standard read)
They are then sent to the destination shard (a fairly standard insert/write)
Finally, after the meta data has been updated, they are removed from the source shard.

Step 3 is a delete, which requires a write lock on the source shard, but it should be quite fast - the documents are already in memory from the migration.

One other impact of increasing the frequency of the migrations is that the shard version is going to be updated more frequently - in particular the major shard version (so that it has an up to date mapping of chunks to shards.

That means that you will see more logged messages about the mongos needing to refresh its config and update its shard version. It may also be a good idea to run the flushRouterConfig command before you kick off long running operations like Map/Reduce or findAndModify.

If your shards have periods of low usage, you will see the migrations happen more quickly, and you can also consider using the balancer window option to only run balancing during certain times if you do notice a significant impact.

199

answered Oct 14 '22 01:10

Adam Comerford

As always I would assume you want to try and add these members as soon as you start to see unfavorable performance numbers

It's been my experience that you want to add a shard in anticipation of increased traffic. Especially if the number of shards is low (< 6 or so). Migration of data to a new node will increase IO on the existing nodes, it will also increase network traffic.

So if you're already experiencing IO issues, adding a shard is just going to make it worse. You may end up "baby-sitting" the migrations or using the Balancer window option. In fact the existence of the balancer window option should tell you something about the intensity of the balancing process.

What sort of impacts to you get when this happens?

The other unusual side-effect here is that data not normally in memory may be pulled in to memory. For example, if you have historical data that sits untouched most of the day, it can be pulled in to be moved even though your clients are not actively reading it.

Again, this will tie back to IO and my comments above.

When I add these new members and it sees that there is a new Shard Member available Mongo then starts to re-arrange the chunks...

Note that this only happens for collections that are sharded and have a shard key. Unsharded collections don't move at all. This can sometimes fly under the radar until traffic starts accumulating on one shard for unknown reasons.

For data that is unsharded, you may want to keep it on a separate Replica Set to ensure that your shards behave as expected.

answered Oct 14 '22 01:10

Gates VP

Related questions
                            
                                Getting hash with symbol as keys for mongo in rails
                            
                                How do I sort a MongoDB collection by a field in an embedded document using Mongoid?
                            
                                How to get affected number of rows using mongodb ,php driver
                            
                                MongoDB Geospatial queries in php
                            
                                Trying to include Query within MongoDB MapReduce call
                            
                                Product catalog search - good use case for NoSQL / MongoDB?
                            
                                Fetch inherited instances from MongoDB using C#
                            
                                Sort by $natural in MongoDB with the official c# driver
                            
                                using a mongo script from inside the javascript shell?
                            
                                How to change field or update fast in MongoDB?
                            
                                MongoDB c# driver - Can a field called Id not be Id?
                            
                                mongoid batch update
                            
                                How can I get MongoDB working with php 5.3.5 & wamp?
                            
                                PHP MongoDB update multiple documents using $in/$or
                            
                                Mongodb - create a new document in an embedded array
                            
                                SafeModeResult is null after update
                            
                                Under MongoDB java driver Mapreduce command scope; add functions to Scope
                            
                                Removing from Mongo old documents by id
                            
                                Mongoid store_in produces random results
                            
                                how to run aggregate query in mongodb client on RockMongo or mViewer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With