Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Windows Services -- High availability scenarios and design approach

Let's say I have a standalone windows service running in a windows server machine. How to make sure it is highly available?

1). What are all the design level guidelines that you can propose?

2). How to make it highly available like primary/secondary, eg., the clustering solutions currently available in the market

3). How to deal with cross-cutting concerns in case any fail-over scenarios

If any other you can think of please add it here ..

Note: The question is only related to windows and windows services, please try to obey this rule :)

like image 863
asyncwait Avatar asked Apr 07 '10 12:04

asyncwait


People also ask

What is high availability architecture?

The solution lies in High Availability (HA) architecture. The approach defines the modules, components and implementation of services of a system for optimal operational performance, even at the time of high loads. It simply refers to a component or system that is continuously operational for a desirably long period of time.

What are the main test scenarios related to high availability?

The main test scenarios related to high availability are given below: 1. Evaluate various availability scenarios in the real world. Some of the scenarios include simulating software failure, network failure, geo-specific availability, service failure, and availability during peak loads. 2. For each scenario, design availability test cases.

How to ensure system availability?

Make sure you will evangelize to every architecture and design team about the right priority of the availability as a system requirement. Following system characteristics have shown to have contributed to system availability: Some examples of this are to never have only a single VM behind a VIP or never store only a single copy of your data.

What is availability in software testing?

"Availability" is a measure of the ability of clients to connect with and use a resource at any point in time. If a resource is not available, clients cannot use it. One way to understand high availability is to contrast it with fault tolerance.


1 Answers

To keep the service at least running you can arrange for the Windows Service Manager to automatically restart the service if it crashes (see the Recovery tab on the service properties.) More details are available here, including a batch script to set these properties - Restart a windows service if it crashes

High availability is more than just keeping the service up from the outside - the service itself needs to be built with high-availabiity in mind (i.e. use good programming practices throughout, appropriate datastructures, pairs resource aquire and release), and the whole stress-tested to ensure that it will stay up under expected loads.

For idempotent commands, tolerating intermittent failures (such as locked resources) can be achieved by re-invoking the command a certain number of times. This allows the service to shield the client from the failure (up to a point.) The client should also be coded to anticipate failure. The client can handle service failure in several ways - logging, prompting the user, retrying X times, logging a fatal error and exiting are all possible handlers - which one is right for you depends upon your requirements. If the service has "conversation state", when service fails hard (i.e. process is restarted), the client should be aware of and handle ths situation, as it usually means current conversation state has been lost.

A single machine is going to be vulnerable to hardware failure, so if you are going to use a single machine, then ensure it has redundant components. HDDs are particularly prone to failure, so have at least mirrored drives, or a RAID array. PSUs are the next weak point, so redundant PSU is also worthwhile, as is a UPS.

As to clustering, Windows supports service clustering, and manages services using a Network Name, rather than individual Computer names. This allows your client to connect to any machine running the service and not a hard-coded name. But unless you take additional measures, this is Resource failover - directing requests from one instance of the service to another. Converstaion state is usually lost. If your services are writing to a database, then that should also be clustered to also ensure reliabiity and ensure changes are available to the entire cluster, and not just the local node.

This is really just the tip of the iceberg, but I hope it gives you ideas to get started on further research.

Microsoft Clustering Service (MSCS)

like image 95
mdma Avatar answered Oct 15 '22 20:10

mdma