We are connecting to an external Hazelcast cluster (version 3.7.2) using the Java Hazelcast client but are having issues reconnecting if the cluster goes down.
We are creating our client with HazelcastClient.newHazelcastClient
. Once we do that, we are keeping a copy of the HazelcastInstance
and using that to interact with the Hazelcast cluster (getMap
, getSet
, etc.). We are also storing the maps, sets, etc. that we get from the HazelcastInstance
in potentially long lived objects. Everything works fine in the happy path. However, if the cluster ever goes down and comes back up, we get HazelcastInstanceNotActiveException
when trying to access these objects that were created prior to the cluster going down.
Is there a way to automatically re-establish the client connection when the cluster comes back online so we can resume using use the objects (maps, sets, etc.) we'd previously retrieved from Hazelcast before the cluster went down? Or do we need to have additional code to catch HazelcastInstanceNotActiveException
and then rebuild the HazelcastInstance
and any objects we have stored in the client application? The latter seems like it will be quite invasive and definitely not desirable to deal with in each instance we store one of these Hazelcast objects.
Most of the things I've read refer to the NetworkConfig
settings for connection timeout, attempt limit, and attempt timeout. We are currently using the default values but they do not seem to do anything when accessing an object we've already retrieved. Any access to a previously existing object immediately fails with HazelcastInstanceNotActiveException
even after the cluster is back up.
This seems like a common problem many people would run into. What is the best practice for dealing with this?
In Hazelcast 3.11 has been released the exponential backoff client reconnect strategy: https://docs.hazelcast.org/docs/latest/manual/html-single/#configuring-client-connection-retry.
<hazelcast-client>
...
<connection-strategy async-start="false" reconnect-mode="ON">
<connection-retry enabled="true">
<initial-backoff-millis>1000</initial-backoff-millis>
<max-backoff-millis>60000</max-backoff-millis>
<multiplier>2</multiplier>
<fail-on-max-backoff>true</fail-on-max-backoff>
<jitter>0.5</jitter>
</connection-retry>
</connection-strategy>
...
</hazelcast-client>
As you already read setting the value of connection attempts to Integer.MAX_VALUE and making the duration between attempts higher is where you're heading to.
At the moment there's no other way to solve this issue. I imagine a minimalist SPI to provide custom strategies on how to handle reconnection, like exponential back-off but such a thing doesn't exist yet.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With