Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RAFT consensus protocol - Should entries be durable before commiting

Tags:

raft

consensus

I have the following query about implementation RAFT:

Consider the following scenario\implementation:

  1. RAFT leader receives a command entry, it appends the entry to an in-memory array It then sends the entries to followers (with the heartbeat)
  2. The followers receive the entry and append it to their in-memory array and then send a response that it has received the entry
  3. The leader then commits the entry by writing it to a durable store (file) The leader sends the latest commit index in the heartbeat
  4. The followers then commit the entries based on leader's commit index by storing the entry to their durable store (file)

One of the implementations of RAFT (link: https://github.com/peterbourgon/raft/) seems to implement it this way. I wanted to confirm if this fine.

Is it OK if entries are maintained "in memory" by the leader and the followers until it is committed? In what circumstances might this scenario fail?

like image 737
coder_bro Avatar asked Mar 20 '23 18:03

coder_bro


1 Answers

I disagree with the accepted answer.

  1. A disk isn't mystically durable. Assuming the disk is local to the server it can permanently fail. So clearly writing to disk doesn't save you from that. Replication is durability provided that the replicas live in different failure domains which if you are serious about durability they will be. Of course there are many hazards to a process that disks don't suffer from (linux oom killed, oom in general, power etc), but a dedicated process on a dedicated machine can do pretty well. Especially if the log store is say ramfs, so process restart isn't an issue.

  2. If log storage is lost then host identity should be lost as well. A,B,C identify logs. New log, new id. B "rejoining" after (potential) loss of storage is simply a buggy implementation. The new process can't claim the identity of B because it can't be sure that it has all the information that B had. Just like in the case of always flushing to disk if we replaced the disk of the machine hosting B we couldn't just restart the process with it configured to have B's identity. That would be nonsense. It should restart as D in both cases then ask to join the cluster. At which point the problem of losing committed writes disappears in a puff of smoke.

like image 127
benjumanji Avatar answered May 16 '23 08:05

benjumanji