Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra Java driver: how many contact points is reasonable?

In Java I connect to Cussandra cluster as this:

Cluster cluster = Cluster.builder().addContactPoints("host-001","host-002").build();

Do I need to specify all hosts of the cluster in there? What If I have a cluster of 1000 nodes? Do I randomly choose few? How many, and do I really do that randomly?

like image 543
henry Avatar asked Nov 10 '14 20:11

henry


People also ask

What is contact points in Cassandra?

Contact points are addresses of Cassandra nodes. The list of contact points should be comma-separated and in hostname:port format. Example node1:port,node2:port,.... The default client port for Cassandra is 9042, but the port(s) must be explicitly specified.

How many Cassandra nodes do I need?

It is recommended to use two to three seed nodes per Cassandra data center (data centers are explained below), and keep the seeds list uniform across all the nodes.

What is the load balancing available for Cassandra?

A Cassandra cluster is typically composed of multiple nodes; the load balancing policy (sometimes abbreviated LBP) is a central component that determines: which nodes the driver will communicate with; for each new query, which coordinator to pick, and which nodes to use as failover.

What is the cluster size in Cassandra?

For Cassandra 1.1, it is 500 to 800GB per node.


2 Answers

I would say that configuring your client to use the same list of nodes as the list of seed nodes you configured Cassandra to use will give you the best results.

As you know Cassandra nodes use the seed nodes to find each other and discover the topology of the ring. The driver will use only one of the nodes provided in the list to establish the control connection, the one used to discover the cluster topology, but providing the client with the seed nodes will increase the chance for the client to continue to operate in case of node failures.

like image 107
Alex Popescu Avatar answered Sep 24 '22 23:09

Alex Popescu


My approach is to add as many nodes as I can -- The reason is simple: seeds are necessary only for cluster boot but once the cluster is up and running seeds are just common nodes -- using only seeds may result in the impossibility to connect in a working cluster -- So I give myself the best chances to connect to the cluster keeping a more than reasonable amount of nodes -- it's enough one working node to get the current cluster configuration.

like image 24
Carlo Bertuccini Avatar answered Sep 24 '22 23:09

Carlo Bertuccini