Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Leader election for paxos-based replicated key value store

I am going to implement a key value store with multi Paxos. I would have several nodes, one of which is the primary node. This primary node receive update requests and replicate values to slave nodes.

My question is how the primary node (or leader) is selected? Can I still use the Paxos algorithm? If so, do you think it is necessary to abstract the paxos implementation to a single unit that could be used not only by the replication unit but also the leader election unit?

If I use the node with the least id to be the leader? How can I implement the master lease?

Thanks for any answers.

like image 504
yuefengz Avatar asked Mar 25 '14 02:03

yuefengz


People also ask

Is Paxos used for leader election?

Performing leader election with Paxos then becomes as simple as proposing the statement “I am the leader”. If it is accepted, then you are the leader until another node proposes to be the leader.

What databases use Paxos?

The Clustrix database is a distributed database that uses Paxos in the transaction manager. Paxos is used by the database internals to coordinate messages and maintain transaction atomicity in a distributed system. Acceptors are the nodes that log the state of the transaction.

How does Paxos work?

In general terms, Paxos selects a single value from one or more of the values that are proposed, and then broadcasts that value to all of the cooperating computers. Once the Paxos algorithm has run, all of the computers (or database nodes)) agree upon the proposed value, and the cluster clocks forward.

What is Paxos and raft?

Unlike Paxos, which is derived directly from the distributed consensus problem, Raft is proposed from the multi-replicated state machine. It uses stronger assumptions to reduce the states to be considered, making it easy to understand and implement. Raft and Multi-Paxos are highly related.


1 Answers

Before I get to the actual question, I would suggest that for a paxos-like system, you don't think of it as a master-slave relationship, but rather an equal-peer relationship. Basic Paxos doesn't even have a leader concept. Multi-paxos tacks on a leader as a performance optimization, electing that leader is part of the protocol.

Multi-Paxos boils down to Paxos underneath: there is a prepare phase and an accept phase. The insight of Multi-Paxos is that once a node wins an accept round, it has simultaneously won leader election and after that the prepare phase isn't necessary from that leader until it detects that another node has taken over leadership.


And now some practical advise. I have many years of experience working on several paxos, multi-paxos, and other consensus systems.

I first suggest not implementing either Paxos or Multi-paxos. Optimizing Paxos systems for performance while keeping it correct is very hard—especially if you are having these types of questions. I would instead look into implementing the Raft protocol.

Taking both protocols as is right off the paper, the Raft protocol can have much better throughput than Multi-Paxos. The Raft authors (and others) suggest that Raft is easier to understand, and implement.

You may also look into using one of the open-source Raft systems. I don't have experience with any of them to tell you how easy it is to maintain. I have heard, though, of pain in maintaining Zookeeper instances. (I have also heard complaints about Zookeeper's correctness proof.)

Next, it has been proven that every consensus protocol can loop forever. Build into your system a time-out mechanism, and randomized backoffs where appropriate. This is how practical engineers get around theoretical impossibilities.

Finally, examine your throughput needs. If your throughput is high enough, you will need to figure out how to partition across several consensus-clusters. And that's a whole 'nother ball of wax.

like image 139
Michael Deardeuff Avatar answered Nov 11 '22 01:11

Michael Deardeuff