I need help properly diagnosing com.hazelcast.core.OperationTimeoutException.
com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. Aborting invocation! Invocation{ serviceName='hz:impl:mapService', op=GetOperation{TRADES}, partitionId=87, replicaIndex=0, tryCount=250, tryPauseMillis=500, invoke Count=1, callTimeout=60000, target=Address[10.32.21.170]:17326, backupsExpected=0, backupsCompleted=0}
No response has been received! backups-expected:0 backups-completed: 0
It appears the 120,000ms is configurable, but I don't think increasing this is the answer. When this does happen, all calls fail for the same reason regardless a get or set operation etc.
Can anyone provide a recommendation for what parameters should be adjusted to alleviate the issue? Perhaps it is actually a thread contention issue and increasing event threads or the likes may help. The hazelcast instance has no custom parameters at this time. Thread counts are all default. The server is not in excessive garbage collection during the time either.
Most probable cause of this exception a network problem among cluster members. An unresponsive node (because of memory or GC problems etc) can also cause such issue. First thing is can be to ensure quality/performance of your network env. If you are using AWS, you can prefer instance with better network performance.
If you want to get rid of problematic nodes quickly; you can set a lower value for following system property: "hazelcast.max.no.heartbeat.seconds" : Maximum timeout for heartbeat in seconds for a node to assume it is dead. Default is 500 seconds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With