I am trying to make sure that I understand what happens when I add a new Shard (Replica Set) to an existing Shard Cluster. When I add these new members and it sees that there is a new Shard Member available Mongo then starts to re-arrange the chunks so that it can take advantage of the new members correct? What sort of impacts to you get when this happens? As always I would assume you want to try and add these members as soon as you start to see unfavorable performance numbers (If other tuning options are not helping).
Just wanted to get a better understanding of what happens when you add a Shard when a cluster already exists.
Thanks,
S
When a new shard is added to a sharded cluster, every other shard will donate chunks to it. A chunk consists of a subset of sharded data of a sharded collection. Existing shards move data to the new shard and this reduces their data footprint. The process responsible for distributing the data is called a balancer.
2. What happens if a primary shard goes down? If a primary shard goes down, Elasticsearch will automatically promote one of the replica shards to be the new primary shard. All write and read requests will then be routed to the new primary shard.
Each shard should be a replica set, if a specific mongod instance fails, the replica set members will elect another to be primary and continue operation. However, if an entire shard is unreachable or fails for some reason, that data will be unavailable.
When you add a shard to an existing cluster, it will automatically become the shard with the lowest number of chunks for every sharded collection. That means that it will be the default target for migrations (from the shard with the highest number of chunks) until things get more balanced. However, each shard primary (which is responsible for the migrations) can only take part in one migration at a time. As such, the balancing is going to take a while, especially if things are under load.
In terms of the migrations themselves, you are seeing them in your current cluster already, so that is how to judge their impact. You can view the recent migrations in the logs, or you can take a look at the changelog (a 10MB capped collection that contains the most recent migrations/splits etc.):
// connect to a mongos, switch to the config DB
use config
// look at the changelog
db.changelog.find()
In terms of what operations happen, well to move a chunk:
Step 3 is a delete, which requires a write lock on the source shard, but it should be quite fast - the documents are already in memory from the migration.
One other impact of increasing the frequency of the migrations is that the shard version is going to be updated more frequently - in particular the major shard version (so that it has an up to date mapping of chunks to shards.
That means that you will see more logged messages about the mongos needing to refresh its config and update its shard version. It may also be a good idea to run the flushRouterConfig command before you kick off long running operations like Map/Reduce or findAndModify.
If your shards have periods of low usage, you will see the migrations happen more quickly, and you can also consider using the balancer window option to only run balancing during certain times if you do notice a significant impact.
As always I would assume you want to try and add these members as soon as you start to see unfavorable performance numbers
It's been my experience that you want to add a shard in anticipation of increased traffic. Especially if the number of shards is low (< 6 or so). Migration of data to a new node will increase IO on the existing nodes, it will also increase network traffic.
So if you're already experiencing IO issues, adding a shard is just going to make it worse. You may end up "baby-sitting" the migrations or using the Balancer window option. In fact the existence of the balancer window option should tell you something about the intensity of the balancing process.
What sort of impacts to you get when this happens?
The other unusual side-effect here is that data not normally in memory may be pulled in to memory. For example, if you have historical data that sits untouched most of the day, it can be pulled in to be moved even though your clients are not actively reading it.
Again, this will tie back to IO and my comments above.
When I add these new members and it sees that there is a new Shard Member available Mongo then starts to re-arrange the chunks...
Note that this only happens for collections that are sharded and have a shard key. Unsharded collections don't move at all. This can sometimes fly under the radar until traffic starts accumulating on one shard for unknown reasons.
For data that is unsharded, you may want to keep it on a separate Replica Set to ensure that your shards behave as expected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With