Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Configure a Mongo replica set to only replicate certain collections

I have a ~3GB mongo database with several dozen collections. Three of these collections handle ~300 queries per second, while the rest sustain a much lower volume. I expect the traffic to continue to grow quickly.

I'd like to set up a replica set to handle the high-traffic collections. It isn't necessary for this new instance to replicate the rest of the database. Is this possible?

like image 837
mchail Avatar asked Jan 22 '13 21:01

mchail


2 Answers

Seems like not possible at the moment by built-in features of mongodb and only way to do is to come up with your own manual replication algorithm or use some other tools written by third parties.

https://github.com/wordnik/wordnik-oss project might help you to achieve this according to the following post.

https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/Ap9V4ArGuFo

Describes workaround to filter documents in replication.

Replicate only documents where {'public':true} in MongoDB

Or just replicate the data yourself manually which might worth trying.

Good luck.

like image 86
cubbuk Avatar answered Sep 20 '22 12:09

cubbuk


No that isn't possible now. What you could do is move those collections into another unreplicated database. But this will cause headaches once these collections see higher traffic too, so you would need to move them into your "replication"-db.

But in general Replication isn't the way to go if you need to scale, it's more considered for DR/failover. Replicaset Secondaries can only (optionally) answer read queries but no write queries, this is something you should keep in mind. So if you have high write load this may not cure your problem.
Once you allow your application to read from secondaries you need to live with eventual consistency, meaning that your application isn't guaranteed to see always the latest data. This is caused due to the asynchronous replication to the secondaries.
Indeed you can cure this problem if you configure your writeconcern, so that the write needs to succeeded on all replicas, before it's considered written and your driver returns. But this may slow down your write operations significant.

So for scaling query execution capabilities I would go with Sharding. This is possible on a per collection level, all unsharded collections will remain on a "default-shard".

like image 34
philnate Avatar answered Sep 18 '22 12:09

philnate