I read this from the official DSE doc but it did not go in depth in to how. Can someone explain or provide any links to how?
In Cassandra, data is stored and retrieved via a partitioning system. A partitioner is what determines where the primary copy of a data set is stored. This works with nodal tokens in a direct format. Every node owns or is responsible for a set of tokens based on a partition key.
Maximum recommended capacity for Cassandra 1.2 and later is 3 to 5TB per node for uncompressed data. For Cassandra 1.1, it is 500 to 800GB per node. Be sure to account for replication.
Most node failures result from temporary conditions, such as network issues. Therefore, Cassandra assumes the node will eventually come back online, and that permanent cluster changes will be executed explicitly using nodetool .
An Apache Cassandra Datacenter is a group of nodes, related and configured within a cluster for replication purposes. Setting up a specific set of related nodes into a datacenter helps to reduce latency, prevent transactions from impact by other workloads, and related effects.
It's better to look into architecture guide for this kind of information.
There are multiple places that could be considered as some kind of load balancers. First - you can send requests to any node in the cluster, and this node will work as "coordinator", re-sending the request to the nodes that actually owns the data. Because this is not very optimal, drivers provides so-called token-aware load balancing policy, where driver is able to infer from data, which nodes are responsible for handling them, and send request to one of the nodes, selected based on other information (contributed by other load balancing policies).
In case of the multiple data centers, drivers & Cassandra itself, are able to send requests to "remote" DCs if "local" isn't available (notion of remote & local are specific to consumers). But in this case, some other factors will play their role - for example, if you have LOCAL_
consistency levels, then your requests won't be sent to "remote" data center.
Talking about application design - you may use load balancer before your application layer that will connect to Cassandra cluster in their "local" data center, and use LOCAL_
consistency levels to perform their operations. In case of downtime of one of the DCs, the load balancer should stop to send traffic to application layer in that DC.
Load balancer is builtin to the drivers/connections. For example, Java driver "roundrobin" behavior is explained in the documentation here:
https://docs.datastax.com/en/developer/java-driver-dse/1.6/manual/load_balancing/
Also explained here:
https://docs.datastax.com/en/developer/java-driver/3.1/manual/load_balancing/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With