Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AppFabric Redundancy

We just tested an AppFabric cluster of 2 servers where we removed the "lead" server. The second server timeouts on any request to it with the error:

Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>: There is a temporary failure. Please retry later. (One or more specified Cache servers are unavailable, which could be caused by busy network or servers. Ensure that security permission has been granted for this client account on the cluster and that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Retry later.)

In practive this means that if one server in the cluster goes down then they all go down. (Note we are not using Windows cluster, only linking multiple AppFabric cache servers to each other.)

I need the cluster to continue operating even if a single server goes down. How do I do this?

(I realize this question is borderlining Serverfault, but imho developers should know this.)

like image 871
Tedd Hansen Avatar asked Jan 20 '11 12:01

Tedd Hansen


2 Answers

You'll have to install the AppFabric cache on at least three lead servers for the cache to survive a single server crash. The docs state that the cluster will only go down if the "majority" of the lead servers go down, but in the fine print, they explain that 1 out of 2 constitutes a majority. I've verified that removing a server from a three lead-node cluster works as advertised.

like image 109
John Hann Avatar answered Sep 25 '22 01:09

John Hann


Typical distributed systems concept. For a write or read quorum to occur in an ensemble you need to have 2f + 1 servers up where f is number of servers failing. I think appfabric or any CP (as in CAP theorem) consensus based systems need this to happen for working of the cluster.

--Sai

like image 30
Sai Venkat Avatar answered Sep 24 '22 01:09

Sai Venkat