Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems connecting to Cassandra pool from Spring application

I hope, that someone is actually able to help, because I'm currently stuck on trying to work with Cassandra ATM.

My Set up: For development, I have a minimal Cassandra 3.0.4 cluster with two nodes (one on my working machine, one in a VM). Usually only the local one is up and running. I use the latest Java driver version 3.0.0 to connect to the pool.

My cassandra.yaml contains rpc_address and listen_address to the IP of each node. The seed is my primary working machine.

My Problem: Everything is working fine from cqlsh (at any time) and when both nodes are up an running (from Java). But as soon as I stop the one in the VM, my Spring based application is throwing errors during startup:

2016-03-29 09:05:33.515 | INFO  | main                 | com.datastax.driver.core.NettyUtil                          :83    | Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
2016-03-29 09:05:34.147 | INFO  | main                 | com.datastax.driver.core.policies.DCAwareRoundRobinPolicy   :95    | Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
2016-03-29 09:05:34.149 | INFO  | main                 | com.datastax.driver.core.Cluster$Manager                    :1475  | New Cassandra host /10.20.30.74:9042 added
2016-03-29 09:05:34.149 | INFO  | main                 | com.datastax.driver.core.Cluster$Manager                    :1475  | New Cassandra host /10.20.30.77:9042 added
2016-03-29 09:05:34.150 | INFO  | main                 | my_company.cassandra.dao.impl.CassandraDaoImpl     :55    | Connected to cluster: TestCaseCluster
2016-03-29 09:05:34.151 | INFO  | main                 | my_company.cassandra.dao.impl.CassandraDaoImpl     :57    | Datacenter: datacenter1; Host: /10.20.30.74; Rack: rack1, State: UP|true
2016-03-29 09:05:34.151 | INFO  | main                 | my_company.cassandra.dao.impl.CassandraDaoImpl     :57    | Datacenter: datacenter1; Host: /10.20.30.77; Rack: rack1, State: UP|true
2016-03-29 09:05:34.220 | WARN  | luster1-nio-worker-2 | com.datastax.driver.core.SessionManager$7                   :378   | Error creating pool to /10.20.30.77:9042
com.datastax.driver.core.exceptions.ConnectionException: [/10.20.30.77] Pool was closed during initialization
    at com.datastax.driver.core.HostConnectionPool$2.onSuccess(HostConnectionPool.java:149) [cassandra-driver-core-3.0.0.jar:?]
    at com.datastax.driver.core.HostConnectionPool$2.onSuccess(HostConnectionPool.java:135) [cassandra-driver-core-3.0.0.jar:?]
    at com.google.common.util.concurrent.Futures$4.run(Futures.java:1181) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$CombinedFuture.setOneValue(Futures.java:1626) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$CombinedFuture.access$400(Futures.java:1470) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$CombinedFuture$2.run(Futures.java:1548) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$FallbackFuture$1$1.onSuccess(Futures.java:475) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$4.run(Futures.java:1181) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$ImmediateFuture.addListener(Futures.java:102) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures.addCallback(Futures.java:1184) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$FallbackFuture$1.onFailure(Futures.java:472) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$FallbackFuture$1$1.onFailure(Futures.java:483) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:101) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:170) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures.addCallback(Futures.java:1184) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$FallbackFuture$1.onFailure(Futures.java:472) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:857) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) [guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:68) [guava-16.0.1.jar:?]
    at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:157) [cassandra-driver-core-3.0.0.jar:?]
    at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:140) [cassandra-driver-core-3.0.0.jar:?]
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) [netty-common-4.0.33.Final.jar:4.0.33.Final]
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) [netty-common-4.0.33.Final.jar:4.0.33.Final]
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) [netty-common-4.0.33.Final.jar:4.0.33.Final]
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) [netty-common-4.0.33.Final.jar:4.0.33.Final]
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:276) [netty-transport-4.0.33.Final.jar:4.0.33.Final]
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:292) [netty-transport-4.0.33.Final.jar:4.0.33.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) [netty-transport-4.0.33.Final.jar:4.0.33.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [netty-transport-4.0.33.Final.jar:4.0.33.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [netty-transport-4.0.33.Final.jar:4.0.33.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-transport-4.0.33.Final.jar:4.0.33.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) [netty-common-4.0.33.Final.jar:4.0.33.Final]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]

In this example, I find the follwing line interessting:

Datacenter: datacenter1; Host: /10.20.30.77; Rack: rack1, State: UP|true

Because this is the mentioned VM, that is actually down:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
DN  10.20.30.77  89.12 KB   256          100.0%            197f6f0f-b820-4ab8-b7ef-bcc8773a345c  rack1
UN  10.20.30.74  96.26 KB   256          100.0%            db7d053b-f8d1-4a59-9cb2-3abf54b24687  rack1

Where DN should mean "Down" & "Normal", according to this excerpt from nodetools status. So as far as I understand it, the Java driver doesn't recognize the second node as down and still tries to connect to it, because it's in the list of (available) nodes.

Cas this be an incompatibility issues, because of the combination of driver version and Cassandra version? But I thought, they are compatible: DataStax Java-Driver documentation on GitHub

Please ask, if you need more info. I will update this text accordingly.

Thanks and regards.

Daniel

edit1: I have initialized a Keyspace with replication class SimpleStrategy and factor 3 - I read somewere, that the number should not exceed the number of nodes (I guess it was somewhere in the docu, but I don't have the link anymore)... Can this be a reason?

like image 778
dzim Avatar asked Mar 29 '16 07:03

dzim


2 Answers

A pitty, that no one seems to know of this kind of problem. After several attempts and searches on the Internet (where I found close to nothing on this particular problem) I was almost giving up on the idea.

But.

Then two things came to my attention:

  1. The replication factor for my test keyspace was 3, while I only had two nodes. Not very sensible.
  2. If I had looked closly to the exception, I would have seen, that this is a warning and not a fatal error.

So what?

I was still able to connect to the cluster and actually query it, but always gave up too early because of this exception.

Almost "Much Ado About Nothing".

Everything's now working so far and I could further develop my applciation. As well as learn a lot about this high availability NoSQL database and where it differes from "classic" relational database, even if the query language has many similarities. It's quite exiting!

So: Sorry for the fuss!

Cheers, Daniel

like image 122
dzim Avatar answered Oct 13 '22 20:10

dzim


I tried reading more about the issue you have faced since I was facing the same issue. I had a cluster of 4 nodes and I was encountering an issue for one of the nodes. I did the below two steps to avoid getting this error -

  1. Removed the particular failing node from the node list which I was passing whie creating cassandra cluster in my java class.
  2. Removed the node from my cluster configuration in cassandra. #1 will not make any changes unless node has been removed from the cluster.

Reversibly, you should actually be fixing the node if it's down and cannot be started. If the node is not required, it should be removed from the cluster and it shall not pop up as a warning while starting the service. I guess, Cassandra needs a permanent fix for this issue (warning) since if the node is down, it shall not be required to create a session. On the otherhand, its just a warning and can be ignored if nothing looks suspicious for your application.

like image 38
Bhaskar Avatar answered Oct 13 '22 20:10

Bhaskar