Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hazelcast OperationTimeoutException

Tags:

hazelcast

I need help properly diagnosing com.hazelcast.core.OperationTimeoutException.

com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. Aborting invocation! Invocation{ serviceName='hz:impl:mapService', op=GetOperation{TRADES}, partitionId=87, replicaIndex=0, tryCount=250, tryPauseMillis=500, invoke Count=1, callTimeout=60000, target=Address[10.32.21.170]:17326, backupsExpected=0, backupsCompleted=0}

No response has been received! backups-expected:0 backups-completed: 0

It appears the 120,000ms is configurable, but I don't think increasing this is the answer. When this does happen, all calls fail for the same reason regardless a get or set operation etc.

Can anyone provide a recommendation for what parameters should be adjusted to alleviate the issue? Perhaps it is actually a thread contention issue and increasing event threads or the likes may help. The hazelcast instance has no custom parameters at this time. Thread counts are all default. The server is not in excessive garbage collection during the time either.

like image 469
Pschmeltz Avatar asked Sep 20 '25 15:09

Pschmeltz


1 Answers

Most probable cause of this exception a network problem among cluster members. An unresponsive node (because of memory or GC problems etc) can also cause such issue. First thing is can be to ensure quality/performance of your network env. If you are using AWS, you can prefer instance with better network performance.

If you want to get rid of problematic nodes quickly; you can set a lower value for following system property: "hazelcast.max.no.heartbeat.seconds" : Maximum timeout for heartbeat in seconds for a node to assume it is dead. Default is 500 seconds.

like image 72
enesness Avatar answered Sep 23 '25 11:09

enesness