Likewise are there design patterns that should be avoided?
Each of these patterns makes it possible to decouple one or more parts of the application from other portions thus providing increased availability and uptime for critical areas like external users and customers. Each of these design patterns can be applied to provide increased application resilience.
Design patterns provide general solutions, documented in a format that doesn't require specifics tied to a particular problem. In addition, patterns allow developers to communicate using well-known, well understood names for software interactions.
I assume you are writing a server type application (lets leave Web apps for a while - there are some good off the shelf solutions that can help there, so lets look at the "i've got this great new type of server I have write", but I want it to be HA problem).
In a server implementation, the requests from clients are usually (in some form or another) converted to some event or command type pattern, and are then executed on one or more queue's.
So, first problem - need to store events/commands in a manner that will survive in the cluster (ie. when a new node takes over as master , it looks at the next command that needs executing and begins).
Lets start with a single threaded server impl (the easiest - and concepts still apply to multi-threaded but its got its own set of issues0. When a command is being processed need some sort of transaction processing.
Another concern is managing side effects and how do you handle failure of the current command ? Where possible, handle side effects in a transactional manner, so that they are all or nothing. ie. if the command changes state variables, but crashes half way through execution, being able to return to the "previous" state is great. This allows the new master node to resume the crashed command and just re-run the command. A good way again is breaking a side effects into smaller tasks that can again be run on any node. ie. store the main request start and end tasks, with lots of little tasks that handle say only one side effect per task.
This also introduces other issues which will effect your design. Those state variables are not necessarily databases updates. They could be shared state (say a finite state machine for an internal component) that needs to also be distributed in the cluster. So the pattern for managing changes such that the master code must see a consistent version of the state it needs, and then committing that state across the cluster. Using some form of immutable (at least from the master thread doing the update) data storage is useful. ie. all updates are effectively done on new copies that must go through some sort of mediator or facade that only updates the local in memory copies with the updates after updating across the cluster (or the minimum number of members across the cluster for data consistency).
Some of these issues are also present for master worker systems.
Also need good error management as the number of things that can go wrong on state update increases (as you have the network now involved).
I use the state pattern a lot. Instead of one line updates, for side effects you want to send requests/responses, and use conversation specific fsm's to track the progress.
Another issue is the representation of end points. ie. client connected to master node needs to be able to reconnect to the new master, and then listen for results ? Or do you simply cancel all pending results and let the clients resubmit ? If you allow pending requests to be processed, a nice way to identify endpoints (clients) is needed (ie. some sort of client id in a lookup).
Also need cleanup code etc (ie. don't want data waiting for a client to reconnect to wait forever).
Lots of queue are used. A lot of people will therefore using some message bus (jms say for java) to push events in a transactional manner.
Terracotta (again for java) solves a lot of this for you - just update the memory - terracotta is your facade/mediator here. They have just inject the aspects for your.
Terracotta (i don't work for them) - introduces the concept of "super static", so you get these cluster wide singletons that are cool, but you just need to be aware how this will effect testing and development workflow - ie. use lots of composition, instead of inheritance of concrete implementations for good reuse.
For web apps - a good app server can help with session variable replication and a good load balancer works. In someways, using this via a REST (or your web service method of choice) is a an easy way to write a multi-threaded service. But it will have performance implications. Again depends on your problem domain.
Messages serves (say jms) are often used to introduce a loose coupling between different services. With a decent message server, you can do a lot of msg routing (again apache camel or similar does a great job) ie. say a sticky consumer against a cluster of jms producers etc. that can also allow for good failover. Jms queue's etc can provide a simple way to distribute cmds in the cluster, indept of master / slave. (again it depends on if you are doing LOB or writing a server / product from scratch).
(if i get time later I will tidy up, maybe put some more detail in fix spelling grammar etc)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With