I'm trying to figure out how a system like Cassandra which uses consistent hashing handles cascading node failures. I know there's this concept of virtual nodes and keys are mapped to virtual nodes. Virtual nodes are in turn mapped to actual physical nodes, the idea being that each physical nodes gets an equal share of the keyspace. My question is the following: what happens when a physical node goes down? All the virtual nodes that are on this physical node will need to be moved over to another physical node. Wont this lead to a domino effect which could overload other nodes in the cluster? How do real life systems handle cases like this?
When node goes down, there is no automatic movement of the token ranges. If node is completely down, token range rebalance should be triggered manually by removing the node, or assassinating it.
When node that is responsible for some token ranges is going down, its replicas continue to serve the traffic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With