Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resources about crash-safe and fault-tolerance programming

I like the LWN article "Crash-only software" and I would like to learn more about crash-safe and fault-tolerant programming.

It is surprisingly hard to assure that the persistent state is consistent in fault situations. Here I do not even talk about distributed operations: That is hard on a single node, too: Even the normal Berkeley DB (BDB Data Store or BDB Concurrent Data Store) might have a destroyed database if the system crashes. Not only that high level application constraints are broken, the database might not be opened correctly if the system crashes.

What are good resources about crash-safe and fault-tolerant designs, approaches, and programming.

If the resources focus on C++ and POSIX environments, I would appreciate that.

like image 858
dmeister Avatar asked Mar 08 '10 22:03

dmeister


People also ask

What is the importance of implementing a fault tolerance system?

The key benefit of fault tolerance is to minimize or avoid the risk of systems becoming unavailable due to a component error.

How do you implement fault tolerance?

To make it a fault tolerant, we need to identify potential failures, which a system might encounter, and design counteractions. Each failure's frequency and impact on the system need to be estimated to decide which one a system should tolerate.

Which programming language is highly fault tolerant?

Erlang is a functional programming language which also has a runtime environment. It was built in such a way that it had integrated support for concurrency, distribution and fault tolerance.

What is fault tolerance in computer science?

Fault tolerance refers to the ability of a system (computer, network, cloud cluster, etc.) to continue operating without interruption when one or more of its components fail.


2 Answers

Akka is a framework for Java and Scala that is written with let-it-crash in mind. See this article and this presentation for an introduction to Actors and let-it-crash. It is also called Fail-Fast and worker/supervisor style.

Two good presentations on erlang is Systems that Never Stop (and Erlang) and Message Passing Concurrency in Erlang

Theron is a actor library for C++, I also think there is something in Boost also.

Also Erlang can call C or C++ code see this for a discussion. Java / Scala / Akka can also call C++ code.

(If you like C++ I suggest you to have a look at Scala, very nice language and better than Java if you come from C++.)

Also Jonas Boners presentation Scalability, Availability & Stability Patterns is a good presentation on the topic.

like image 119
oluies Avatar answered Oct 07 '22 09:10

oluies


The Aktor model in languages Erlang and Scala the let it crash model. See this article.

like image 42
TTMAN Avatar answered Oct 07 '22 09:10

TTMAN