We have a TIBCO EMS solution that uses built-in server failover in a 2-4 server environment. If the TIBCO admins fail-over services from one EMS server to another, connections are supposed to be transfered to the new server automatically at the EMS service level. For our C# applications using the EMS service, this is not happening - our user connections are not being transfered to the new server after failover and we're not sure why.
Our application connection to EMS at startup only so if the TIBCO admins failover after users have started our application, they users need to restart the app in order to reconnect to the new server (our EMS connection uses a server string including all 4 production EMS servers - if the first attempt fails, it moves to the next server in the string and tries again).
I'm looking for an automated approach that will attempt to reconnect to EMS periodically if it detects that the connection is dead but I'm not sure how best to do that.
Any ideas? We are using TIBCO.EMS.dll version 4.4.2 and .Net 2.x (SmartClient app)
Any help would be appreciated.
First off, yes, I am answering my own question. Its important to note, however, that without ajmastrean, I would be nowhere. thank you so much!
ONE: ConnectionFactory.SetReconnAttemptCount, SetReconnAttemptDelay, SetReconnAttemptTimeout should be set appropriately. I think the default values re-try too quickly (on the order of 1/2 second between retries). Our EMS servers can take a long time to failover because of network storage, etc - so 5 retries at 1/2s intervals is nowhere near long enough.
TWO: I believe its important to enable the client-server and server-client heartbeats. Wasn't able to verify but without those in place, the client might not get the notification that the server is offline or switching in failover mode. This, of course, is a server side setting for EMS.
THREE: you can watch for failover event by setting Tibems.SetExceptionOnFTSwitch(true); and then wiring up a exception event handler. When in a single-server environment, you will see a "Connection has been terminated" message. However, if you are in a fault-tolerant multi-server environment, you will see this: "Connection has performed fault-tolerant switch to ". You don't strictly need this notification, but it can be useful (especially in testing).
FOUR: Apparently not clear in the EMS documentation, connection reconnect will NOT work in a single-server environment. You need to be in a multi-server, fault tolerant environment. There is a trick, however. You can put the same server in the connection list twice - strange I know, but it works and it enables the built-in reconnect logic to work.
some code:
private void initEMS()
{
Tibems.SetExceptionOnFTSwitch(true);
_ConnectionFactory = new TIBCO.EMS.TopicConnectionFactory(<server>);
_ConnectionFactory.SetReconnAttemptCount(30); // 30retries
_ConnectionFactory.SetReconnAttemptDelay(120000); // 2minutes
_ConnectionFactory.SetReconnAttemptTimeout(2000); // 2seconds
_Connection = _ConnectionFactory.CreateTopicConnectionM(<username>, <password>);
_Connection.ExceptionHandler += new EMSExceptionHandler(_Connection_ExceptionHandler);
}
private void _Connection_ExceptionHandler(object sender, EMSExceptionEventArgs args)
{
EMSException e = args.Exception;
// args.Exception = "Connection has been terminated" -- single server failure
// args.Exception = "Connection has performed fault-tolerant switch to <server url>" -- fault-tolerant multi-server
MessageBox.Show(e.ToString());
}
This post should sum up my current comments and explain my approach in more detail...
The TIBCO 'ConnectionFactory' and 'Connection' types are heavyweight, thread-safe types. TIBCO suggests that you maintain the use of one ConnectionFactory (per server configured factory) and one Connection per factory.
The server also appears to be responsible for in-place 'Connection' failover and re-connection, so let's confirm it's doing its job and then lean on that feature.
Creating a client side solution is going to be slightly more involved than fixing a server or client setup problem. All sessions you have created from a failed connection need to be re-created (not to mention producers, consumers, and destinations). There are no "reconnect" or "refresh" methods on either type. The sessions do not maintain a reference to their parent connection either.
You will have to manage a lookup of connection/session objects and go nuts re-initializing everyone! or implement some sort of session failure event handler that can get the new connection and reconnect them.
So, for now, let's dig in and see if the client is setup to receive failover notification (tib ems users guide pg 292). And make sure the raised exception is caught, contains the failover URL, and is being handled properly.
Client applications may receive notification of a failover by setting the tibco.tibjms.ft.switch.exception system property
Perhaps the library needs that to work?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With