Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How cassandra replicates data

Tags:

cassandra

I am just curious to understand the mechanism of replication in Cassandra. I read the Datastax link about data distribution:

http://www.datastax.com/docs/1.2/cluster_architecture/data_distribution

In the consistent hashing section it tells that Cassandra creates hash value for each primary key and based on that send the data to node that accommodates the generated hash value. After that it shows distribution of data in a cluster. Now my question is how it copies this data to other nodes in a cluster based on hash value.

This may be very basic question. Please explain by example if possible.

like image 875
Harish Kumar Avatar asked May 31 '13 10:05

Harish Kumar


People also ask

How does Cassandra distribute data?

It accomplishes this using partitions. Each node owns a particular set of tokens, and Cassandra distributes data based on the ranges of these tokens across the cluster. The partition key is responsible for distributing data among nodes and is important for determining data locality.

How does Cassandra increase replication factor?

If you want to change the replication factor of a keyspace, you can do it by executing the ALTER KEYSPACE command, which has the following syntax: Syntax: ALTER KEYSPACE "KeySpace Name" WITH replication = {'class': 'Strategy name', 'replication_factor' : 'No. Of replicas'};

Which replication strategy is used in Cassandra for single data center?

1. SimpleStrategy: It is a simple strategy that is recommended for multiple nodes over multiple racks in a single data center.


1 Answers

The way replicas are found depends on replication strategy. For the SimpleStrategy with replication factor N without virtual nodes Cassandra does the following:

  1. Hash the key
  2. Find the node with smallest token greater than or equal to the hash, wrapping around if necessary
  3. Store the key on that node and the next N-1 nodes in token order

As an example, suppose your nodes have tokens 0, 10, 20, 30 and your replication factor is 2. If your key has hash 14 then it will be stored on the nodes with tokens 20 and 30. If your key has hash 28 then it will be stored on the nodes with tokens 30 and 0.

If you use virtual nodes, the same idea is used but virtual nodes will be skipped as replicas if the physical node has already received the key.

If using NetworkTopologyStrategy, nodes are skipped if the quota for that data center is reached.

like image 59
Richard Avatar answered Sep 21 '22 06:09

Richard