I am trying to understand how three-phase commit avoids blocking
Consider the following two failure scenarios:
Scenario 1: In phase 2 the coordinator sends preCommit messages to all cohorts and has gotten an ack from all except cohort A. Network problems prevent cohort A from receiving the coordinator's preCommit message. Cohort A times out waiting for the preCommit message and chooses to abort. Then both the coordinator and cohort A crash.
Scenario 2: The protocol reaches phase 3. The coordinator sends a doCommit message to cohort A. But before it can send more doCommit messages the coordinator crashes. Cohort A commits its part of the transaction then crashes.
As far as I can tell the remaining cohorts have the exact same state at the end of scenario 1 and scenario 2. So when a recovery coordinator steps in how can it find out from the remaining cohorts whether we are in scenario 1 and abort or we are in scenario 2 and commit and thus avoid blocking?
Three-Phase Commit (3PC) Protocol is an extension of the Two-Phase Commit (2PC) Protocol that avoids blocking problem under certain assumptions. In particular, it is assumed that no network partition occurs, and not more than k sites fail, where we assume 'k' is predetermined number.
Three-phase commit (3PC) is a synchronization protocol that ensures global atomicity of distributed transactions while alleviating the blocking aspect of 2PC (Two-Phase Commit) in the events of site failures. That is, 3PC never requires operational sites to wait (i.e., block) until a failed site has recovered.
In summary, the 2PC protocol is a blocking Two-Phase commit protocol. The 3PC protocol is a non-blocking Three-Phase commit protocol. However, the 3PC protocol does not recover in the event the network in segmented situation.
Three-phase commit isn't magic; it's just more resilient than two-phase commit. In particular, 3PC is resilient against single-point failure, but not all kinds of multiple-point failure. Both scenarios in the question posit two-point failures.
Three-phase commit isn't magic; it's just more resilient than two-phase commit. In particular, 3PC is resilient against single-point failure, but not all kinds of multiple-point failure. Both scenarios in the question posit two-point failures. In other words, the premise of the question is misguided; it's asking more of 3PC than it's capable of.
For further reading, here's a sentence from the abstract of a paper on the subject, Analysis and Verification of Two-Phase Commit & Three-Phase Commit Protocols, by Muhammad Atif, to whet your appetite:
We also apply our method to its “amended” variant, the Three-Phase Commit Protocol (3PC) and prove it to be erroneous for simultaneous site failures
I found this paper to provide a foothold into the literature. There's no small amount of it on this subject, if you want to delve in.
In the two-phase commit the coordinator sends a prepare message to all participants (nodes) and waits for their answers. The coordinator then sends their answers to all other sites. Every participant waits for these answers from the coordinator before committing to or aborting the transaction.
The two-phase commit protocol also has limitations in that it is a blocking protocol
. For example, participants will block resource processes while waiting for a message from the coordinator. If for any reason this fails, the participant will continue to wait and may never resolve its transaction. Therefore the resource could be blocked indefinitely. On the other hand, a coordinator will also block resources while waiting for replies from participants. In this case, a coordinator can also block in definitely if no acknowledgement is received from the participant.
However, the three-phase protocol introduces a third phase called the pre-commit
. The aim of this is to 'remove the uncertainty period for participants that have committed and are waiting for the global abort or commit message from the coordinator.
When receiving a pre-commit message, participants know that all others have voted to commit. If a pre-commit message has not been received the participant will abort and release any blocked resources.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With