How the Cassandra client chooses the coordinator node? is the coordinator node stores the data sent by the client before replicating?
Cassandra will locate any data based on a partition key that is mapped to a token value by the partitioner. Tokens are part of a finite token ring value range where each part of the ring is owned by a node in the cluster. The node owning the range of a certain token is said to be the primary for that token.
In a production system with three or more Cassandra nodes in each data center, the default replication factor for an Edge keyspace is three. As a general rule, the replication factor should not exceed the number of Cassandra nodes in the cluster.
When a request is sent to any Cassandra node, this node acts as a proxy for the application (actually, the Cassandra driver) and the nodes involved in the request flow. This proxy node is called as the coordinator. The coordinator is responsible for managing the entire request path and to respond back to the client.
Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The total number of replicas across the cluster is referred to as the replication factor.
The coordinator is selected by the driver based on the policy you have set. Common policies are DCAwareRoundRobinPolicy and TokenAware Policy.
For DCAwareRoundRobinPolicy, the driver selects the coordinator node based on its round robin policy. See more here: http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.html
For TokenAwarePolicy, it selects a coordinator node that has the data being queried - to reduce "hops" and latency. More info: http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/TokenAwarePolicy.html
It is a best practice to wrap policies so there is a primary and secondary policy should there be an issue. More information available at the links above.
The coordinator node is typically chosen by an algorithm which takes "network distance" into account. Any node can act as the coordinator, and at first requests will be sent to the nodes which your driver knows about. But once it connects and understands the topology of your cluster, it may change to a "closer" coordinator.
The coordinator only stores data locally (on a write) if it ends up being one of the nodes responsible for the data's token range.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With