Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any frameworks to synchronize data generated on one peer with all other peers in an unreliable network?

We are developing a system with the following requirements.

  • There are N systems that each generate data that is unique to themselves
  • Each system requires the data from every other system to perform its end goal
  • These systems are talking to each other on an unreliable network.
  • It is expected that some systems will be completely unavailable for extended periods of time (but they may be in contact with some of there peers who are in contact with the rest of the network)

To put it another way, each system needs to replicate its data to N peer systems. Ideally, this will be done in an intelligent manner.

I have considered looking into database synchronization frameworks, but I am concerned that it is overkill for this problem. I don't think there is any possibility for row conflicts because each system's data is entirely independent of other systems.

The question is, do you know of any frameworks that could help solve this problem? Or possibly a way to phrase this issue that might help me down a path to discover a solution.

Finally, ideally, this framework would be in C++ (and potentially, java).

like image 583
Justin Breitfeller Avatar asked Mar 07 '12 23:03

Justin Breitfeller


People also ask

What is the technique of data synchronization?

Data synchronization is the ongoing process of synchronizing data between two or more devices and updating changes automatically between them to maintain consistency within systems. While the sheer quantity of data afforded by the cloud presents challenges, it also provides the perfect solution for big data.

What is data sync replication?

Synchronous replication is the process of copying data over a storage area network, local area network or wide area network so there are multiple, current copies of the data. Synchronous replication is mainly used for high-end transactional applications that need instant failover if the primary node fails.


3 Answers

SymmetricDS.org

The solution you are looking for sounds a lot like the open source software SymmetricDS.

"SymmetricDS is an asynchronous data replication software package that supports multiple subscribers and bi-directional synchronization. It uses web and database technologies to replicate tables between relational databases, in near real time if desired. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage."
-SymmetricDS.org

Symmetric was designed to be used as a Java library, as well as a stand alone application. Used with a lightweight database like H2, you could avoid your overkill scenario. H2 can optionally be run embedded within an application and can store data in memory or to disk.

Disclaimer: I recently started working for JumpMind, the company that develops this software.

like image 185
Austin Brougher Avatar answered Oct 13 '22 01:10

Austin Brougher


0mq. It is a C framework with a C++ interface. It notably supports EPGM (reliable multicast over UDP) and N-to-N connections. Though, there will be work to do for your special use case.

like image 38
J.N. Avatar answered Oct 13 '22 01:10

J.N.


Interesting problem. Many of the issues you've described lend themselves particularly well to the BitTorrent protocol.

like image 26
MattDavey Avatar answered Oct 12 '22 23:10

MattDavey