Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how Cassandra chooses the coordinator node and the replication nodes?

How the Cassandra client chooses the coordinator node? is the coordinator node stores the data sent by the client before replicating?

like image 659
Nuwan Sudarshana Avatar asked Sep 30 '15 13:09

Nuwan Sudarshana


People also ask

How does Cassandra determine which node in a ring receives which data?

Cassandra will locate any data based on a partition key that is mapped to a token value by the partitioner. Tokens are part of a finite token ring value range where each part of the ring is owned by a node in the cluster. The node owning the range of a certain token is said to be the primary for that token.

How is Cassandra replication factor determined?

In a production system with three or more Cassandra nodes in each data center, the default replication factor for an Edge keyspace is three. As a general rule, the replication factor should not exceed the number of Cassandra nodes in the cluster.

What is the role of the coordinator node in Cassandra?

When a request is sent to any Cassandra node, this node acts as a proxy for the application (actually, the Cassandra driver) and the nodes involved in the request flow. This proxy node is called as the coordinator. The coordinator is responsible for managing the entire request path and to respond back to the client.

How does Cassandra handle data replication?

Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The total number of replicas across the cluster is referred to as the replication factor.


2 Answers

The coordinator is selected by the driver based on the policy you have set. Common policies are DCAwareRoundRobinPolicy and TokenAware Policy.

For DCAwareRoundRobinPolicy, the driver selects the coordinator node based on its round robin policy. See more here: http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.html

For TokenAwarePolicy, it selects a coordinator node that has the data being queried - to reduce "hops" and latency. More info: http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/TokenAwarePolicy.html

It is a best practice to wrap policies so there is a primary and secondary policy should there be an issue. More information available at the links above.

like image 167
Chris Gerlt Avatar answered Sep 22 '22 11:09

Chris Gerlt


The coordinator node is typically chosen by an algorithm which takes "network distance" into account. Any node can act as the coordinator, and at first requests will be sent to the nodes which your driver knows about. But once it connects and understands the topology of your cluster, it may change to a "closer" coordinator.

The coordinator only stores data locally (on a write) if it ends up being one of the nodes responsible for the data's token range.

like image 22
Aaron Avatar answered Sep 23 '22 11:09

Aaron