Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you program a strong read-after-write consistency in a distributed system?

Recently, S3 announces strong read-after-write consistency. I'm curious as to how one can program that. Doesn't it violate the CAP theorem?

In my mind, the simplest way is to wait for the replication to happen and then return, but that would result in performance degradation.

AWS says that there is no performance difference. How is this achieved?

Another thought is that amazon has a giant index table that keeps track of all S3 objects and where it is stored (triple replication I believe). And it will need to update this index at every PUT/DELTE. Is that technically feasible?

like image 766
okysabeni Avatar asked Dec 16 '20 17:12

okysabeni


People also ask

How do you ensure read after writing consistency?

Read-after-write consistency is the ability to view changes (read data) right after making those changes (write data). For example, if you have a user profile and you change your bio on the profile, you should see the updated bio if you refresh the page. There should be no delay during which the old bio shows up.

How can we achieve strong consistency in distributed system?

Strong Consistency: Strong Consistency simply means the data must be strongly consistent at all times. All the server nodes across the world should contain the same value as an entity at any point in time. And the only way to implement this behavior is by locking down the nodes when being updated.

How can a distributed database guarantee read your writes consistency?

The most common way to scale the reads hitting a distributed data store is by adding Read Replicas. These replicas handle all the reads of the systems freeing up the Master to deal with the writes.


1 Answers

As indicated by Martin above, there is a link to Reddit which discusses this. The top response from u/ryeguy gave this answer:

If I had to guess, s3 synchronously writes to a cluster of storage nodes before returning success, and then asynchronously replicates it to other nodes for stronger durability and availability. There used to be a risk of reading from a node that didn't receive a file's change yet, which could give you an outdated file. Now they added logic so the lookup router is aware of how far an update is propagated and can avoid routing reads to stale replicas.

I just pulled all this out of my ass and have no idea how s3 is actually architected behind the scenes, but given the durability and availability guarantees and the fact that this change doesn't lower them, it must be something along these lines.

Better answers are welcome.

like image 173
okysabeni Avatar answered Sep 26 '22 04:09

okysabeni