Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is Cassandra designed to avoid the need for load balancers?

I read this from the official DSE doc but it did not go in depth in to how. Can someone explain or provide any links to how?

like image 827
user2345093 Avatar asked May 17 '18 21:05

user2345093


People also ask

How does Cassandra work?

In Cassandra, data is stored and retrieved via a partitioning system. A partitioner is what determines where the primary copy of a data set is stored. This works with nodal tokens in a direct format. Every node owns or is responsible for a set of tokens based on a partition key.

How much data can Cassandra handle?

Maximum recommended capacity for Cassandra 1.2 and later is 3 to 5TB per node for uncompressed data. For Cassandra 1.1, it is 500 to 800GB per node. Be sure to account for replication.

What happens when a node is down in Cassandra?

Most node failures result from temporary conditions, such as network issues. Therefore, Cassandra assumes the node will eventually come back online, and that permanent cluster changes will be executed explicitly using nodetool .

What is local data center in Cassandra?

An Apache Cassandra Datacenter is a group of nodes, related and configured within a cluster for replication purposes. Setting up a specific set of related nodes into a datacenter helps to reduce latency, prevent transactions from impact by other workloads, and related effects.


2 Answers

It's better to look into architecture guide for this kind of information.

There are multiple places that could be considered as some kind of load balancers. First - you can send requests to any node in the cluster, and this node will work as "coordinator", re-sending the request to the nodes that actually owns the data. Because this is not very optimal, drivers provides so-called token-aware load balancing policy, where driver is able to infer from data, which nodes are responsible for handling them, and send request to one of the nodes, selected based on other information (contributed by other load balancing policies).

In case of the multiple data centers, drivers & Cassandra itself, are able to send requests to "remote" DCs if "local" isn't available (notion of remote & local are specific to consumers). But in this case, some other factors will play their role - for example, if you have LOCAL_ consistency levels, then your requests won't be sent to "remote" data center.

Talking about application design - you may use load balancer before your application layer that will connect to Cassandra cluster in their "local" data center, and use LOCAL_ consistency levels to perform their operations. In case of downtime of one of the DCs, the load balancer should stop to send traffic to application layer in that DC.

like image 131
Alex Ott Avatar answered Oct 05 '22 04:10

Alex Ott


Load balancer is builtin to the drivers/connections. For example, Java driver "roundrobin" behavior is explained in the documentation here:

https://docs.datastax.com/en/developer/java-driver-dse/1.6/manual/load_balancing/

Also explained here:

https://docs.datastax.com/en/developer/java-driver/3.1/manual/load_balancing/

like image 32
spencer7593 Avatar answered Oct 05 '22 03:10

spencer7593