Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra seed nodes and clients connecting to nodes

Tags:

cassandra

I'm a little confused about Cassandra seed nodes and how clients are meant to connect to the cluster. I can't seem to find this bit of information in the documentation.

Do the clients only contain a list of the seed node and each node delegates a new host for the client to connect to? Are seed nodes only really for node to node discovery, rather than a special node for clients?

Should each client use a small sample of random nodes in the DC to connect to?

Or, should each client use all the nodes in the DC?

like image 245
gak Avatar asked May 02 '12 02:05

gak


People also ask

What are Cassandra seed nodes?

A seed node is used to bootstrap the gossip process for new nodes joining a cluster. To learn the topology of the ring, a joining node contacts one of the nodes in the -seeds list in cassandra. yaml. The first time you bring up a node in a new cluster, only one node is the seed node.

How does Cassandra client work?

In Cassandra, the data itself is automatically distributed, with (positive) performance consequences. It accomplishes this using partitions. Each node owns a particular set of tokens, and Cassandra distributes data based on the ranges of these tokens across the cluster.

How many Cassandra nodes minimum you should have in your cluster to make sure it is highly available?

In most cases, you'll want to operate with strong consistency and so need at least 3 nodes. This allows your Cassandra service to continue uninterrupted if you suffer a hardware failure or some other loss of the Cassandra service on a single node (this will happen sooner or later).

What is seed node?

A seed node is a special node that allows the incorporation of new nodes to the network and maintains the strength of the network at all times, by allowing them to synchronize and obtain a copy of the data of the network. blockchain, replicating it and adding strength and security to it.


2 Answers

Answering my own question:

Seeds

From the FAQ:

Seeds are used during startup to discover the cluster.

Also from the DataStax documentation on "Gossip":

The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes.

From these details it seems that a seed is nothing special to clients.

Clients

From the DataStax documentation on client requests:

All nodes in Cassandra are peers. A client read or write request can go to any node in the cluster. When a client connects to a node and issues a read or write request, that node serves as the coordinator for that particular client operation.

The job of the coordinator is to act as a proxy between the client application and the nodes (or replicas) that own the data being requested. The coordinator determines which nodes in the ring should get the request based on the cluster configured partitioner and replica placement strategy.

I gather that the pool of nodes that a client connects to can just be a handful of (random?) nodes in the DC to allow for potential failures.

like image 192
gak Avatar answered Sep 22 '22 17:09

gak


seed nodes serve two purposes.

  1. they act as a place for new nodes to announce themselves to a cluster. so, without at least one live seed node, no new nodes can join the cluster because they have no idea how to contact non-seed nodes to get the cluster status.
  2. seed nodes act as gossip hot spots. since nodes gossip more often with seeds than non-seeds, the seeds tend to have more current information, and therefore the whole cluster has more current information. this is the reason you should not make all nodes seeds. similarly, this is also why all nodes in a given data center should have the same list of seed nodes in their cassandra.yaml file. typically, 3 seed nodes per data center is ideal.

the cassandra client contact points simply provide the cluster topology to the client, after which the client may connect to any node in the cluster. as such, they are similar to seed nodes and it makes sense to use the same nodes for both seeds and client contacts. however, you can safely configure as many cassandra client contact points as you like. the only other consideration is that the first node a client contacts sets its data center affinity, so you may wish to order your contact points to prefer a given data center.

for more details about contact points see this question: Cassandra Java driver: how many contact points is reasonable?

like image 33
james turner Avatar answered Sep 18 '22 17:09

james turner