Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB load balancing in multiple AWS instances

We're using amazon web service for a business application which is using node.js server and mongodb as database. Currently the node.js server is runing on a EC2 medium instance. And we're keeping our mongodb database in a separate micro instance. Now we want to deploy replica set in our mongodb database, so that if the mongodb gets locked or unavailble, we still can run our database and get data from it.

So we're trying to keep each member of the replica set in separate instances, so that we can get data from the database even if the instance of the primary memeber shuts down.

Now, I want to add load balancer in the database, so that the database works fine even in huge traffic load at a time. In that case I can read balance the database by adding slaveOK config in the replicaSet. But it'll not load balance the database if there is huge traffic load for write operation in the database.

To solve this problem I got two options till now.

Option 1: I've to shard the database and keep each shard in separate instance. And under each shard there will be a reaplica set in the same instance. But there is a problem, as the shard divides the database in multiple parts, so each shard will not keep same data within it. So if one instance shuts down, we'll not be able to access the data from the shard within that instance.

To solve this problem I'm trying to divide the database in shards and each shard will have a replicaSet in separate instances. So even if one instance shuts down, we'll not face any problem. But if we've 2 shards and each shard has 3 members in the replicaSet then I need 6 aws instances. So I think it's not the optimal solution.

Option 2: We can create a master-master configuration in the mongodb, that means all the database will be primary and all will have read/write access, but I would also like them to auto-sync with each other every so often, so they all end up being clones of each other. And all these primary databases will be in separate instance. But I don't know whether mongodb supports this structure or not.

I've not got any mongodb doc/ blog for this situation. So, please suggest me what should be the best solution for this problem.

like image 665
Indranil Mondal Avatar asked Jul 10 '14 07:07

Indranil Mondal


2 Answers

This won't be a complete answer by far, there is too many details and I could write an entire essay about this question as could many others however, since I don't have that kind of time to spare, I will add some commentary about what I see.

Now, I want to add load balancer in the database, so that the database works fine even in huge traffic load at a time.

Replica sets are not designed to work like that. If you wish to load balance you might in fact be looking for sharding which will allow you to do this.

Replication is for automatic failover.

In that case I can read balance the database by adding slaveOK config in the replicaSet.

Since, to stay up to date, your members will be getting just as many ops as the primary it seems like this might not help too much.

In reality instead of having one server with many connections queued you have many connections on many servers queueing for stale data since member consistency is eventual, not immediate unlike ACID technologies, however, that being said they are only eventually consistent by 32-odd ms which means they are not lagging enough to give decent throughput if the primary is loaded.

Since reads ARE concurrent you will get the same speed whether you are reading from the primary or secondary. I suppose you could delay a slave to create a pause of OPs but that would bring back massively stale data in return.

Not to mention that MongoDB is not multi-master as such you can only write to one node a time makes slaveOK not the most useful setting in the world any more and I have seen numerous times where 10gen themselves recommend you use sharding over this setting.

Option 2: We can create a master-master configuration in the mongodb,

This would require you own coding. At which point you may want to consider actually using a database that supports http://en.wikipedia.org/wiki/Multi-master_replication

This is since the speed you are looking for is most likely in fact in writes not reads as I discussed above.

Option 1: I've to shard the database and keep each shard in separate instance.

This is the recommended way but you have found the caveat with it. This is unfortunately something that remains unsolved that multi-master replication is supposed to solve, however, multi-master replication does add its own ship of plague rats to Europe itself and I would strongly recommend you do some serious research before you think as to whether MongoDB cannot currently service your needs.

You might be worrying about nothing really since the fsync queue is designed to deal with the IO bottleneck slowing down your writes as it would in SQL and reads are concurrent so if you plan your schema and working set right you should be able to get a massive amount of OPs.

There is in fact a linked question around here from a 10gen employee that is very good to read: https://stackoverflow.com/a/17459488/383478 and it shows just how much throughput MongoDB can achieve under load.

It will grow soon with the new document level locking that is already in dev branch.

like image 86
Sammaye Avatar answered Nov 08 '22 18:11

Sammaye


Option 1 is the recommended way as pointed out by @Sammaye but you would not need 6 instances and can manage it with 4 instances.

Assuming you need below configuration.

  • 2 shards (S1, S2)
  • 1 copy for each shard (Replica set secondary) (RS1, RS2)
  • 1 Arbiter for each shard (RA1, RA2)

You could then divide your server configuration like below.

Instance 1 : Runs : S1 (Primary Node)
Instance 2 : Runs : S2 (Primary Node)
Instance 3 : Runs : RS1 (Secondary Node S1) and RA2 (Arbiter Node S2)
Instance 4 : Runs : RS2 (Secondary Node S2) and RA1 (Arbiter Node S1)

You could run arbiter nodes along with your secondary nodes which would help you in election during fail-overs.

like image 45
Lalit Agarwal Avatar answered Nov 08 '22 18:11

Lalit Agarwal