I'd like to understand better what is fail-fast and fail-safe.
What it seems to me at first glance is that fail-fast means that we want to make the system clearly fail when any unexpected thing happens. I mean for exemple if a factory can't create an instance of object, for fail-fast principle, we really don't want the factory to return null, or empty object, or partially initialized object that could, by chance, be used correctly by the application -> most time we would have an unexpected behaviour, or an unexpected exception raised at another level that wouldn't permit us to know the real matter is in the factory. It is what this principle means?
Fail safe principle is quite hard to understand for me. The most common exemple in Java is about the collections, their iterators and the concurrent access. It's said that a collection/iterator that permits modifying a list while iterating over it is called fail-safe. It's usually done by finally iterating over a copy of the initial list. But in this exemple i don't really understand where the system fails... and thus while it's fail-safe... Where is the failure? We just iterate over a copy or not, depending on our needs... I don't see any match with the wiki definition of fail-safe...
Thus in such articles like: http://www.certpal.com/blogs/2009/09/iterators-fail-fast-vs-fail-safe/ They opposite fail-fast to fail-safe... what i just don't catch is why we call fail-safe this iteration over a copy...
I found another exemple here: http://tutorials.jenkov.com/java-exception-handling/fail-safe-exception-handling.html It seems a lot more related to initial definition of the fail-safe principle. What i think of fail-safe is that when a system fails, we must ensure that the failure handler doesn't fail or, if it does, ensure that the real initial problem is not hidden by the failure of the handler. In the given exemple the handler is right near the initial failure code, but it's not always the case. Fail-safe means to me more something like we handle correctly the errors that could happen in the failure handlers or something like that...
Thus for me these 2 principles doesn't seem incompatible. What do you think? Can't a system fail fast & safely???
Fail-Fast systems abort operation as-fast-as-possible exposing failures immediately and stopping the whole operation. Whereas, Fail-Safe systems don't abort an operation in the case of a failure. Such systems try to avoid raising failures as much as possible.
Fail-fast and Fail-safe are the concepts of concurrent modification. Concurrent modification is a process in which an object is modified concurrently when a different task is running over it. Fail-fast and Fail-safe are the iterators to iterate over the Collection objects.
This is because, they operate on the clone of the collection, not on the original collection and that's why they are called fail-safe iterators. Iterator on CopyOnWriteArrayList, ConcurrentHashMap classes are examples of fail-safe Iterator.
Any changes in the collection, such as adding, removing and updating collection during a thread are iterating collection then Fail fast throw concurrent modification exception. The fail-safe collection doesn't throw exception. 2. Type of collection. ArrayList and hashmap collection are the examples of fail-fast ...
It is better to avoid failure in the first place (fail safe), but if this is not possible, it is best to fail fast (to fail as quickly as possible).
The two are not opposites, but complementary.
As you say - I like my code to be as fail safe as possible, but where it isn't, I want it to fail fast.
fail safe does not mean that something will not fail -- it means that when it fails, it fails in a safe way. Something that cannot fail is failure proof -- it that is possible at all.
A fail safe elevator jams at its present location if the cable breaks. The riders are inconveniently stuck, but conveniently not dead.
Consider the example of an iterator. The theory is that it is better to signal to the client code immediately that something is amiss, rather than to blindly return a valid-looking answer that may cause more serious problems down the line. It the client code is safety conscious, it has the opportunity to intervene and recover right away. So in this instance, fail safe and fail fast are compatible, the latter being a strategy to achieve the former.
On the other hand, consider a web browser in the hands of someone who is not comfortable with computers. They are trying to see what time their movie starts. Let's say that (heaven-forbid) the HTML on the page is not well-formed. If the renderer were to fail fast, it might decide to abandon rendering the information the user wants to see because a preceding <HR>
tag is spelled <H>
. In this case, it is better just blunder on, rendering the page as best as possible. The error might be insignificant and never caught, or it may be caught much, much later as someone finally notices that the page doesn't look quite right. So here is an example where fail fast is not a good strategy for fail safe.
If the web page were my online banking application, I'd surely want it to blow up spectacularly (with a rollback, of course) if the slightest thing goes wrong. So then fail fast once again becomes the strategy of choice for fail safe.
My point is that fail safe is a concept unto itself, and that fail fast may or may not be a particular technique that contributes to failure safety.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With