Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB one way replication

Tags:

mongodb

Need some way to push data from clients database to central database.Basically, there are several instances of MongoDB running on remote machines [clients] , and need some method to periodically update central mongo database with newly added and modified documents in clients.it must replicate its records to the single central server

Eg:

If I have 3 mongo instances running on 3 machines each having data of 10GB then after the data migration 4th machine's mongoDB must have 30GB of data. And cenral mongoDB machine must get periodically updated with data of all those 3 machines. But these 3 machines not only get new documents but existing documents in them may get updated. I would like the central mongoDB machine also to get these updations.

like image 854
deepakmodak Avatar asked Dec 14 '12 12:12

deepakmodak


1 Answers

Your desired replication strategy is not formally supported by MongoDB.

A MongoDB replica set consists of a single primary with asynchronous replication to one or more secondary servers in the same replica set. You cannot configure a replica set with multiple primaries or replication to a different replica set.

However, there are a few possible approaches for your use case depending on how actively you want to keep your central server up to date and the volume of data/updates you need to manage.

Some general caveats:

  • Merging data from multiple standalone servers can create unexpected conflicts. For example, unique indexes would not know about documents created on other servers.

  • Ideally the data you are consolidating will still be separated by a unique database name per origin server so you don't have strange crosstalk between disparate documents that happen to have the same namespace and _id shared by different origin servers.

Approach #1: use mongodump and mongorestore

If you just need to periodically sync content to your central server, one way to do so is using mongodump and mongorestore. You can schedule a periodic mongodump from each of your standalone instances and use mongorestore to import them into the central server.

Caveats:

  • There is a --db parameter for mongorestore that allows you to restore into a different database from the original name (if needed)

  • mongorestore only performs inserts into the existing database (i.e. does not perform updates or upserts). If existing data with the same _id already exists on the target database, mongorestore will not replace it.

  • You can use mongodump options such as --query to be more selective on data to export (for example, only select recent data rather than all)

  • If you want to limit the amount of data to dump & restore on each run (for example, only exporting "changed" data), you will need to work out how to handle updates and deletions on the central server.

Given the caveats, the simplest use of this approach would be to do a full dump & restore (i.e. using mongorestore --drop) to ensure all changes are copied.

Approach #2: use a tailable cursor with the MongoDB oplog.

If you need more realtime or incremental replication, a possible approach is creating tailable cursors on the MongoDB replication oplog.

This approach is basically "roll your own replication". You would have to write an application which tails the oplog on each of your MongoDB instances and looks for changes of interest to save to your central server. For example, you may only want to replicate changes for selective namespaces (databases or collections).

A related tool that may be of interest is the experimental Mongo Connector from 10gen labs. This is a Python module that provides an interface for tailing the replication oplog.

Caveats:

  • You have to implement your own code for this, and learn/understand how to work with the oplog documents

  • There may be an alternative product which better supports your desired replication model "out of the box".

like image 97
Stennie Avatar answered Oct 19 '22 23:10

Stennie