Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keeping distributed databases synchronized in a unstable network

Tags:

I'm facing the following challenge:

I have a bunch of databases in different geographical locations where the network may fail a lot (I'm using cellular network). I need to keep all the databases synchronized but there is no need to be in real time. I'm using Java but I have the freedom to choose any free database.

Any suggestions on how I can achieve this.

Thanks.

like image 239
jassuncao Avatar asked Sep 24 '09 17:09

jassuncao


People also ask

What is synchronous distributed database?

In synchronous replication approach, the database is synchronized so that all the replications always have the same value. A transaction requesting a data item will have access to the same value in all the sites.

When should you synchronize databases?

Database synchronization establishes data consistency between two or more databases, automatically copying changes back and forth. Harmonization of the data over time should be performed continuously.

What are the three main issues concerning distributed database design?

The factors to be considered are the distribution of data, communication cost, and lack of sufficient locally-available information. The objective is to optimize where the inherent parallelism is used to improve the performance of executing the transaction, subject to the abovementioned constraints.


2 Answers

It's a problem with a quite established corpus of research (of which people is apparently unaware). I suggest to not reinvent a poor, defective wheel if not absolutely necessary (such as, for example, so unusual requirements to allow a trivial solution).

Some keywords: replication, mobile DBMSs, distributed disconnected DBMSs.

Also these research papers are relevant (as an example of this research field):

  1. Distributed disconnected databases,
    • The dangers of replication and a solution,
    • Improving Data Consistency in Mobile Computing Using Isolation-Only Transactions,
    • Dealing with Server Corruption in Weakly Consistent, Replicated Data Systems,
    • Rumor: Mobile Data Access Through Optimistic Peer-to-Peer Replication,
    • The Case for Non-transparent Replication: Examples from Bayou,
    • Bayou: replicated database services for world-wide applications,
    • Managing update conflicts in Bayou, a weakly connected replicated storage system,
    • Two-level client caching and disconnected operation of notebook computers in distributed systems,
    • Replicated document management in a group communication system,

... and so on.

like image 64
MaD70 Avatar answered Oct 23 '22 15:10

MaD70


I am not aware of any databases that will give you this functionality out of the box; there is a lot of complexity here due to the need for eventual consistency and conflict resolution (eg, what happens if the network gets split into 2 halves, and you update something to the value 123 while I update it on the other half to 321, and then the networks reconnect?)

You may have to roll your own.

For some ideas on how to do this, check out the design of Yahoo's PNUTS system: http://research.yahoo.com/node/2304 and Amazon's Dynamo: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

like image 21
SquareCog Avatar answered Oct 23 '22 14:10

SquareCog