Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra NoHostAvailableException: All host(s) tried for query failed in Production

We have 10 Cassandra nodes in production running Cassandra-2.1.8. We recently upgraded to 2.1.8 version. Previously we were using only 3 nodes running Cassandra-2.1.2. First we upgraded the initial 3 nodes from 2.1.2 to 2.1.8 (following the procedure as described in Upgrading Cassandra). Then we added 7 more nodes running Cassandra-2.1.8 in cluster. Then we started our client programs. For first few hours everything worked fine, but after few hours, we saw some errors in client program logs like

Thread-0 [29/07/15 17:41:23.356] ERROR  com.cleartrail.entityprofiling.engine.InterpretationWriter - Error:com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: [/172.50.33.161:9041, /172.50.33.162:9041, /172.50.33.95:9041, /172.50.33.96:9041, /172.50.33.165:9041, /172.50.33.166:9041, /172.50.33.163:9041, /172.50.33.164:9041, /172.50.33.42:9041, /172.50.33.167:9041] - use getErrors() for details)
       at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
       at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259)
       at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175)
       at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
       at com.cleartrail.entityprofiling.engine.InterpretationWriter.WriteInterpretation(InterpretationWriter.java:430)
       at com.cleartrail.entityprofiling.engine.Profiler.buildProfile(Profiler.java:1042)
       at com.cleartrail.messageconsumer.consumer.KafkaConsumer.run(KafkaConsumer.java:336)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: [/172.50.33.161:9041, /172.50.33.162:9041, /172.50.33.95:9041, /172.50.33.96:9041, /172.50.33.165:9041, /172.50.33.166:9041, /172.50.33.163:9041, /172.50.33.164:9041, /172.50.33.42:9041, /172.50.33.167:9041] - use getErrors() for details)
       at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102)
       at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:176)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)

Now, I double checked the Firewall (as suggested in few posts), ports, timeouts in client as well as nodes and they all are correct.

I am also not closing the connection anywhere in between. I am using batch queries with batch size of 1000 and the queries are update queries updating counters in my table with three columns

entity , twfwv , cvalue

where entity and twfwv columns are text and primary key and cvalue is counter column.

I even restarted all my nodes (because this trick helped me in my dev environment when I faced the same exception) but its not helping. Please suggest what can be the probable problem here.

like image 332
abi_pat Avatar asked Jul 29 '15 15:07

abi_pat


2 Answers

My issue was resolved by removing/using a property to set or unset the custom load balancing TokenAwarePolicy my connection was using, and relying on the default.

Specifically, I was trying to get a local spring boot app talking to a single dockerized Cassandra instance.

        Cluster.Builder builder = Cluster.builder()
            .addContactPoints(cassandraProperties.getHosts())
            .withPort(cassandraProperties.getPort())
            .withProtocolVersion(ProtocolVersion.V4)
            .withRetryPolicy(new LoggingRetryPolicy(DefaultRetryPolicy.INSTANCE))
            .withCredentials(cassandraProperties.getUsername(), cassandraProperties.getPassword())
            .withCodecRegistry(codecRegistry);

        if (loadBalanced) {
            builder.withLoadBalancingPolicy(
                new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().withLocalDc(localDc).build()));
        }
like image 59
Matt Avatar answered Nov 15 '22 16:11

Matt


My issue was resolved by checking the errors collection of NoHostAvailableException as advised by Olivier Michallat in the comments. For me it was the protocol version on the cluster configuration. Mine was null, setting it to 3 fixed the problem.

like image 33
bitsprint Avatar answered Nov 15 '22 16:11

bitsprint