does a disaster proof language exist?

Question

When creating system services which must have a high reliability, I often end up writing the a lot of 'failsafe' mechanisms in case of things like: communications which are gone (for instance communication with the DB), what would happen if the power is lost and the service restarts.... how to pick up the pieces and continue in a correct way (and remembering that while picking up the pieces the power could go out again...), etc etc

I can imagine for not too complex systems, a language which would cater for this would be very practical. So a language which would remember it's state at any given moment, no matter if the power gets cut off, and continues where it left off.

Does this exist yet? If so, where can I find it? If not, why can't this be realized? It would seem to me very handy for critical systems.

p.s. In case the DB connection is lost, it would signal that a problem arose, and manual intervention is needed. The moment he connection is restored, it would continue where it left off.

EDIT: Since the discussion seems to have died off let me add a few points(while waiting before I can add a bounty to the question)

The Erlang response seems to be top rated right now. I'm aware of Erlang and have read the pragmatic book by Armstrong (the principal creator). It's all very nice (although functional languages make my head spin with all the recursion), but the 'fault tolerant' bit doesn't come automatically. Far from it. Erlang offers a lot of supervisors en other methodologies to supervise a process, and restart it if necessary. However, to properly make something which works with these structures, you need to be quite the erlang guru, and need to make your software fit all these frameworks. Also, if the power drops, the programmer too has to pick up the pieces and try to recover the next time the program restarts

What I'm searching is something far simpler:

Imagine a language (as simple as PHP for instance), where you can do things like do DB queries, act on it, perform file manipulations, perform folder manipulations, etc.

It's main feature however should be: If the power dies, and the thing restarts it takes of where it left off (So it not only remembers where it was, it will remember the variable states as well). Also, if it stopped in the middle of a filecopy, it will also properly resume. etc etc.

Last but not least, if the DB connection drops and can't be restored, the language just halts, and signals (syslog perhaps) for human intervention, and then carries on where it left off.

A language like this would make a lot of services programming a lot easier.

EDIT: It seems (judging by all the comments and answers) that such a system doesn't exist. And probably will not in the near foreseeable future due to it being (near?) impossible to get right.

Too bad.... again I'm not looking for this language (or framework) to get me to the moon, or use it to monitor someones heartrate. But for small periodic services/tasks which always end up having loads of code handling bordercases (powerfailure somewhere in the middle, connections dropping and not coming back up),...where a pause here,...fix the issues,....and continue where you left off approach would work well.

(or a checkpoint approach as one of the commenters pointed out (like in a videogame). Set a checkpoint.... and if the program dies, restart here the next time.)

Bounty awarded: At the last possible minute when everyone was coming to the conclusion it can't be done, Stephen C comes with napier88 which seems to have the attributes I was looking for. Although it is an experimental language, it does prove it can be done and it is a something which is worth investigating more.

I'll be looking at creating my own framework (with persistent state and snapshots perhaps) to add the features I'm looking for in .Net or another VM.

Everyone thanks for the input and the great insights.

Ira Baxter · Accepted Answer

Software Transactional Memory (STM) combined with nonvolatile RAM would probably satisfy the OP's revised question.

STM is a technique for implementating "transactions", e.g., sets of actions that are done effectively as an atomic operation, or not at all. Normally the purpose of STM is to enable highly parallel programs to interact over shared resources in a way which is easier to understand than traditional lock-that-resource programming, and has arguably lower overhead by virtue of having a highly optimistic lock-free style of programming.

The fundamental idea is simple: all reads and writes inside a "transaction" block are recorded (somehow!); if any two threads conflict on the these sets (read-write or write-write conflicts) at the end of either of their transactions, one is chosen as the winner and proceeds, and the other is forced to roll back his state to the beginning of the transaction and re-execute.

If one insisted that all computations were transactions, and the state at the beginning(/end) of each transaction was stored in nonvolatile RAM (NVRAM), a power fail could be treated as a transaction failure resulting in a "rollback". Computations would proceed only from transacted states in a reliable way. NVRAM these days can be implemented with Flash memory or with battery backup. One might need a LOT of NVRAM, as programs have a lot of state (see minicomputer story at end). Alternatively, committed state changes could be written to log files that were written to disk; this is the standard method used by most databases and by reliable filesystems.

The current question with STM is, how expensive is it to keep track of the potential transaction conflicts? If implementing STM slows the machine down by an appreciable amount, people will live with existing slightly unreliable schemes rather than give up that performance. So far the story isn't good, but then the research is early.

People haven't generally designed languages for STM; for research purposes, they've mostly enhanced Java with STM (see Communications of ACM article in June? of this year). I hear MS has an experimental version of C#. Intel has an experimental version for C and C++. THe wikipedia page has a long list. And the functional programming guys are, as usual, claiming that the side-effect free property of functional programs makes STM relatively trivial to implement in functional languages.

If I recall correctly, back in the 70s there was considerable early work in distributed operating systems, in which processes (code+state) could travel trivally from machine to machine. I believe several such systems explicitly allowed node failure, and could restart a process in a failed node from save state in another node. Early key work was on the Distributed Computing System by Dave Farber. Because designing languages back in the 70s was popular, I recall DCS had it had its own programming language but I don't remember the name. If DCS didn't allow node failure and restart, I'm fairly sure the follow on research systems did.

EDIT: A 1996 system which appears on first glance to have the properties you desire is documented here. Its concept of atomic transactions is consistent with the ideas behind STM. (Goes to prove there isn't a lot new under the sun).

A side note: Back in in 70s, Core Memory was still king. Core, being magnetic, was nonvolatile across power fails, and many minicomputers (and I'm sure the mainframes) had power fail interrupts that notified the software some milliseconds ahead of loss of power. Using that, one could easily store the register state of the machine and shut it down completely. When power was restored, control would return to a state-restoring point, and the software could proceed. Many programs could thus survive power blinks and reliably restart. I personally built a time-sharing system on a Data General Nova minicomputer; you could actually have it running 16 teletypes full blast, take a power hit, and come back up and restart all the teletypes as if nothing happened. The change from cacophony to silence and back was stunning, I know, I had to repeat it many times to debug the power-failure management code, and it of course made great demo (yank the plug, deathly silence, plug back in...). The name of the language that did this, was of course Assembler :-}

Chris S · Answer

From what I know¹, Ada is often used in safety critical (failsafe) systems.

Ada was originally targeted at embedded and real-time systems.

Notable features of Ada include: strong typing, modularity mechanisms (packages), run-time checking, parallel processing (tasks), exception handling, and generics. Ada 95 added support for object-oriented programming, including dynamic dispatch.

Ada supports run-time checks in order to protect against access to unallocated memory, buffer overflow errors, off-by-one errors, array access errors, and other detectable bugs. These checks can be disabled in the interest of runtime efficiency, but can often be compiled efficiently. It also includes facilities to help program verification.

For these reasons, Ada is widely used in critical systems, where any anomaly might lead to very serious consequences, i.e., accidental death or injury. Examples of systems where Ada is used include avionics, weapon systems (including thermonuclear weapons), and spacecraft.

N-Version programming may also give you some helpful background reading.

¹That's basically one acquaintance who writes embedded safety critical software

Gregory Mostizky · Answer

I doubt that the language features you are describing are possible to achieve.

And the reason for that is that it would be very hard to define common and general failure modes and how to recover from them. Think for a second about your sample application - some website with some logic and database access. And lets say we have a language that can detect power shutdown and subsequent restart, and somehow recover from it. The problem is that it is impossible to know for the language how to recover.

Let's say your app is an online blog application. In that case it might be enough to just continue from the point we failed and all be ok. However consider similar scenario for an online bank. Suddenly it's no longer smart to just continue from the same point. For example if I was trying to withdraw some money from my account, and the computer died right after the checks but before it performed the withdrawal, and it then goes back one week later it will give me the money even though my account is in the negative now.

In other words, there is no single correct recovery strategy, so this is not something that can be implemented into the language. What language can do is to tell you when something bad happens - but most languages already support that with exception handling mechanisms. The rest is up to application designers to think about.

There are a lot of technologies that allow designing fault tolerant applications. Database transactions, durable message queues, clustering, hardware hot swapping and so on and on. But it all depends on concrete requirements and how much the end user is willing to pay for it all.

does a disaster proof language exist?

Tags:

language-design

programming-languages

Toad

3 Answers

Ira Baxter

Chris S

Gregory Mostizky

Recent Activity

Donate For Us

does a disaster proof language exist?

Tags:

language-design

programming-languages

Toad

3 Answers

Ira Baxter

Chris S

Gregory Mostizky

Related questions

Recent Activity

Donate For Us