Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Challenges and Best Practices for Failing Over Services

Does anyone know of any established best practices for running Windows services (in my case, developed in .NET) such that they will (automatically) fail over correctly to another server, for high availability purposes?

The main ways I can see this being done are either starting up the secondary server when required (in which case there needs to be something monitoring the other server), or having both services running together (in which case they need to synchronize their work so they don't try to do the same things).

Is there a pattern or model for this sort of problem? I know the exact situation will make a big difference, but it does seem like a fairly common issue.

Thanks

John

like image 699
John Avatar asked Nov 18 '09 20:11

John


People also ask

What are failover strategies?

The primary mechanism for maintaining high system availability is called failover. Under this approach, a failed primary system is replaced by a backup system; that is, processing fails over to the backup system.

What service can be used to improve performance and provide high availability in content sharing?

AWS has services, like S3, SQS, ELB, and SimpleDB, and infrastructure tools, like EC2 and EBS, to help you create a high availability and fault tolerant system in the cloud.


1 Answers

Here's what has worked for me.

From an infrastructure stand point you will need to have 2 Windows servers that are clustered. (2 standard Windows Server boxes will do, the Clustering piece can be installed and configured, most sys admins should know how to do this.) Next, install your service on both nodes of the cluster and have them both turned OFF and set to MANUAL startup. Next, add a clustered resource to the Windows Cluster Administrator for your service that will manage turning on and off your service on whichever node is active. Let the Windows cluster manage when your service is running and on which node. This is the easy part of clustering your service.

From the service stand point, you will want to design your service so that it can be as stateless as possible. This is kind of lame advice but it really depends on what your service is doing. In the design, just assume that at somepoint during the code's lifetime it will stop at the worst possible time. How will the service on the node2 know where to pickup where node1 left off? That's the hard part that you need to design for. Depending on what your service is doing you can leave the last completed task in a db table or shared data file. You could also have it start from the beginning and double check whether that task has been completed or not before acting upon it.

Again, it is really going to depend on what the service needs to accomplish. Hope this helps.

like image 70
Walter Avatar answered Oct 18 '22 11:10

Walter