Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data Synchronization in a Distributed system

We have an REST-based application built on the Restlet framework which supports CRUD operations. It uses a local-file to store the data.

Now the requirement is to deploy this application on multiple VMs and any update operation in one VM needs to be propagated other application instances running on other VMs.

Our idea to solve this was to send multiple POST msgs (to all other applications) when a update operation happens in a given VM. The assumption here is that each application has a list/URLs of all other applications.

Is there a better way to solve this?

like image 266
sanfin Avatar asked Dec 22 '22 09:12

sanfin


2 Answers

Consistency is a deep topic, and a hard thing to get right. The trouble comes when two nearly-simultaneous changes occur to the same data: conflicting updates can arrive in one order on one server, and in another order on another. This is a problem, since the two servers no longer agree on what the data is, and it isn't clear who is "right".

The short-story: get your favorite RDBMS (for example, mysql is popular) and have your app servers connect to in what is called the three-tier model. Be sure to perform complex updates in transactions, which will provide an acceptable consistency model.

The long-story: The three-tier model serves well for small-to-medium scale web sites/services. You will eventually find that the single database becomes the bottleneck. For services whose read traffic is substantially larger than write traffic, a common optimization is to create a single-master, many-slave database replication arrangement, where all writes go to the single master (required for consistency with non-distributed transactions), but the more-common reads could go to any of the read slaves.

For services with evenly-mixed read/write traffic, you may be better served by dropped some of the conveniences (and accompanying restrictions) that formal SQL provides and instead use of one of the various "nosql" data stores that have recently emerged. Their relative merits and fitness for various problems is a deep topic in itself.

like image 157
phs Avatar answered Jan 13 '23 12:01

phs


I can see 7 major options for now. You should find out more details and decide whether the facilities / trade-offs are appropriate for your purpose

  1. Perform the CRUD operation on a common RDBMS. Simplest and most consistent
  2. Perform the CRUD operations on a common RDBMS which runs as fast in-memory RDBMS. eg TimesTen from Oracle etc
  3. Perform the CRUD on a distributed cache or your own home cooked distributed hash table which can guarantee synchronization eg Hazelcast/ehcache and others
  4. Use a fast common state server like REDIS/memcached and perform your updates in a synchronized manner on it and write out the successfull operations to a DB in a lazy manner if required.
  5. Distribute your REST servers such that the CRUD operations on a single entity are only performed by a single master. Once this is done, the details about the changes can be communicated to everyone else using a reliable message bus or a distributed database (eg postgres) that runs underneath and syncs all of your updates fairly fast.
  6. Target eventual consistency and use a distributed data store like Cassandra which lets you target the consistency you require
  7. Use distributed consensus algorithms like Paxos or RAFT or an implementation of the same(recommended) like zookeeper or etcd respectively and take ownership of the item you want to change from each REST server before you perform the CRUD operation - might be a bit slow though and same stuff is what Cassandra might give you.
like image 31
computinglife Avatar answered Jan 13 '23 11:01

computinglife