I'm a little confused about Cassandra seed nodes and how clients are meant to connect to the cluster. I can't seem to find this bit of information in the documentation.
Do the clients only contain a list of the seed node and each node delegates a new host for the client to connect to? Are seed nodes only really for node to node discovery, rather than a special node for clients?
Should each client use a small sample of random nodes in the DC to connect to?
Or, should each client use all the nodes in the DC?
A seed node is used to bootstrap the gossip process for new nodes joining a cluster. To learn the topology of the ring, a joining node contacts one of the nodes in the -seeds list in cassandra. yaml. The first time you bring up a node in a new cluster, only one node is the seed node.
In Cassandra, the data itself is automatically distributed, with (positive) performance consequences. It accomplishes this using partitions. Each node owns a particular set of tokens, and Cassandra distributes data based on the ranges of these tokens across the cluster.
In most cases, you'll want to operate with strong consistency and so need at least 3 nodes. This allows your Cassandra service to continue uninterrupted if you suffer a hardware failure or some other loss of the Cassandra service on a single node (this will happen sooner or later).
A seed node is a special node that allows the incorporation of new nodes to the network and maintains the strength of the network at all times, by allowing them to synchronize and obtain a copy of the data of the network. blockchain, replicating it and adding strength and security to it.
Answering my own question:
Seeds
From the FAQ:
Seeds are used during startup to discover the cluster.
Also from the DataStax documentation on "Gossip":
The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes.
From these details it seems that a seed is nothing special to clients.
Clients
From the DataStax documentation on client requests:
All nodes in Cassandra are peers. A client read or write request can go to any node in the cluster. When a client connects to a node and issues a read or write request, that node serves as the coordinator for that particular client operation.
The job of the coordinator is to act as a proxy between the client application and the nodes (or replicas) that own the data being requested. The coordinator determines which nodes in the ring should get the request based on the cluster configured partitioner and replica placement strategy.
I gather that the pool of nodes that a client connects to can just be a handful of (random?) nodes in the DC to allow for potential failures.
seed nodes serve two purposes.
the cassandra client contact points simply provide the cluster topology to the client, after which the client may connect to any node in the cluster. as such, they are similar to seed nodes and it makes sense to use the same nodes for both seeds and client contacts. however, you can safely configure as many cassandra client contact points as you like. the only other consideration is that the first node a client contacts sets its data center affinity, so you may wish to order your contact points to prefer a given data center.
for more details about contact points see this question: Cassandra Java driver: how many contact points is reasonable?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With