I'm looking for some good articles on fault tolerant software architectures. Could I please have some recommendations.
Fault tolerance refers to the ability of a system (computer, network, cloud cluster, etc.) to continue operating without interruption when one or more of its components fail.
Fault-tolerant software assures system reliability by using protective redundancy at the software level. There are two basic techniques for obtaining fault-tolerant software: RB scheme and NVP. Both schemes are based on software redundancy assuming that the events of coincidental software failures are rare.
A highly fault-tolerant system might continue at the same level of performance even though one or more components have failed. For example, a building with a backup electrical generator will provide the same voltage to wall outlets even if the grid power fails.
BIST stands for Build in Self Test. System carries out the test of itself after a certain period of time again and again, that is BIST technique for hardware fault-tolerance.
This article about Software Fault Handling techniques covers the following topics:
Handbook of Software Reliability Engineering you can read it in pdf. One of the main principles of software reliability is fault tolerance.
Take a look at chapter 14 Fault-Tolerant software.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With