Cassandra seed nodes and clients connecting to nodes

Tags:

cassandra

I'm a little confused about Cassandra seed nodes and how clients are meant to connect to the cluster. I can't seem to find this bit of information in the documentation.

Do the clients only contain a list of the seed node and each node delegates a new host for the client to connect to? Are seed nodes only really for node to node discovery, rather than a special node for clients?

Should each client use a small sample of random nodes in the DC to connect to?

Or, should each client use all the nodes in the DC?

245

asked May 02 '12 02:05

gak

2 Answers

Answering my own question:

Seeds

From the FAQ:

Seeds are used during startup to discover the cluster.

Also from the DataStax documentation on "Gossip":

The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes.

From these details it seems that a seed is nothing special to clients.

Clients

From the DataStax documentation on client requests:

All nodes in Cassandra are peers. A client read or write request can go to any node in the cluster. When a client connects to a node and issues a read or write request, that node serves as the coordinator for that particular client operation.

The job of the coordinator is to act as a proxy between the client application and the nodes (or replicas) that own the data being requested. The coordinator determines which nodes in the ring should get the request based on the cluster configured partitioner and replica placement strategy.

I gather that the pool of nodes that a client connects to can just be a handful of (random?) nodes in the DC to allow for potential failures.

192

answered Sep 22 '22 17:09

gak

seed nodes serve two purposes.

they act as a place for new nodes to announce themselves to a cluster. so, without at least one live seed node, no new nodes can join the cluster because they have no idea how to contact non-seed nodes to get the cluster status.
seed nodes act as gossip hot spots. since nodes gossip more often with seeds than non-seeds, the seeds tend to have more current information, and therefore the whole cluster has more current information. this is the reason you should not make all nodes seeds. similarly, this is also why all nodes in a given data center should have the same list of seed nodes in their cassandra.yaml file. typically, 3 seed nodes per data center is ideal.

the cassandra client contact points simply provide the cluster topology to the client, after which the client may connect to any node in the cluster. as such, they are similar to seed nodes and it makes sense to use the same nodes for both seeds and client contacts. however, you can safely configure as many cassandra client contact points as you like. the only other consideration is that the first node a client contacts sets its data center affinity, so you may wish to order your contact points to prefer a given data center.

for more details about contact points see this question: Cassandra Java driver: how many contact points is reasonable?

answered Sep 18 '22 17:09

james turner

Related questions
                            
                                Executing CQL through Shell Script?
                            
                                Cassandra "no viable alternative at input"
                            
                                Why don't you start off with a "single & small" Cassandra server as you usually do it with MySQL?
                            
                                Cassandra: Generate a unique ID?
                            
                                alter composite primary key in cassandra CQL 3.0
                            
                                How does cassandra find the node that contains the data?
                            
                                Cassandra: Exiting due to error while processing commit log during initialization
                            
                                How to do a join queries with 2 or more tables in cassandra cql
                            
                                Is there any query for Cassandra as same as SQL:LIKE Condition?
                            
                                Cassandra and Java 9 - ThreadPriorityPolicy=42 is outside the allowed range
                            
                                Cassandra: File "cqlsh", line 95 except ImportError, e:
                            
                                Check CQL version with Cassandra and cqlsh?
                            
                                Is Cassandra production ready for Ruby on Rails?
                            
                                cassandra get all records in time range
                            
                                Getting Cassandra datacenter name in cqlsh
                            
                                How to connect Cassandra using Java class
                            
                                Cassandra - transaction support
                            
                                What's the difference between creating a table and creating a columnfamily in Cassandra?
                            
                                Cassandra has a limit of 2 billion cells per partition, but what's a partition?
                            
                                Whats the difference between Paxos and W+R>=N in Cassandra?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With