Suppose all the databases involved in a distributed transaction implemented with two-phase commit signal that they are ready to commit and have the necessary locks. The coordinator signals to commit and all databases execute their portion of the transaction, but one SQL database encounters a divide-by-zero error as a result of a programming oversight that fails to consider that possibility. Since the coordinator already signaled commit to everyone what happens as a result of that divide-by-zero?
The second commit phase normally does not contain user code that can fail. The participating resource managers need to guarantee that no failure can occur. If this guarantee is violated no guarantees can be provided by the protocol.
Two phase commit tries to solve the Two Generals Problem. There is no full solution to this problem. TPC is an approximation.
Another way TPC can fail is in case of a network partition. Some resource managers might perform the final commit but some might not receive that message. Again, this problem is unsolvable. Even retries cannot solve it.
You can even trigger this problem under real world conditions: Run all participating nodes in a stress test and pull the network cable at an arbitrary point. With high probability your distributed databases are now inconsistent because some commit messages got lost an a very inconvenient time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With