Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra replication factor greater than number of nodes


I am using the datastax java driver for Apache Cassandra (v. 2.1.9) and I am wondering what should happen when I set replication_factor greater than number of nodes. I've read somewhere that Cassandra allows for this operation, but should fail when I will try to save some data (of course it depends on the write consistency level, but I mean the case of ALL).
The problem is that everything works, no exception is being thrown, even if I try to save data. Why?
Maybe the pieces of information which I've read were old, for older versions of Cassandra? One more question, whether it's true, than what would happen when I add another node to the cluster?

like image 726
bemol Avatar asked Mar 09 '16 11:03

bemol


People also ask

How is Cassandra replication factor determined?

In a production system with three or more Cassandra nodes in each data center, the default replication factor for an Edge keyspace is three. As a general rule, the replication factor should not exceed the number of Cassandra nodes in the cluster.

How does Cassandra change replication factor?

If you want to change the replication factor of a keyspace, you can do it by executing the ALTER KEYSPACE command, which has the following syntax: Syntax: ALTER KEYSPACE "KeySpace Name" WITH replication = {'class': 'Strategy name', 'replication_factor' : 'No. Of replicas'};

Can we change replication factor on a live cluster in Cassandra?

Can I change the replication factor (a a keyspace) on a live cluster? Yes, but it will require running a full repair (or cleanup) to change the replica count of existing data: Alter <alter-keyspace-statement> the replication factor for desired keyspace (using cqlsh for instance).

How many nodes does Cassandra cluster have?

As we said earlier, each instance of Cassandra has evolved to contain 256 virtual nodes. The Cassandra server runs core processes. For example, processes like spreading replicas around nodes or routing requests.

How does replication factor affect replication in Cassandra?

For example, if you increase the number of Cassandra nodes to six, but leave the replication factor at three, you do not ensure that all Cassandra nodes have a copy of all the data. If a node goes down, a higher replication factor means a higher probability that the data on the node exists on one of the remaining nodes.

What happens if I add additional Cassandra nodes to the cluster?

If you add additional Cassandra nodes to the cluster, the default replication factor is not affected. For example, if you increase the number of Cassandra nodes to six, but leave the replication factor at three, you do not ensure that all Cassandra nodes have a copy of all the data.

How do I view the replication factor for edge keyspace in Cassandra?

Use the following procedure to view the Cassandra schema, which shows the replication factor for each Edge keyspace: 1 Log in to a Cassandra node. 2 Run the following command:#N#> /opt/apigee/apigee-cassandra/bin/cassandra-cli -h $ (hostname -i) <<< "show... More ...

How does Cassandra achieve high availability and fault tolerance?

Cassandra achieves high availability and fault tolerance by replication of the data across nodes in a cluster. The replication strategy determines where replicas are stored in the cluster. Each node in the cluster owns not only the data within an assigned token range but also the replica for a different range of data.


Video Answer


2 Answers

Cassandra has a concept of "tunable consistency" which in part means you can control the consistency level setting for read/write operations.

You can read a bit more in the docs explaining consistency levels and how to set them in the cqlsh shell.

To learn more I suggest experimenting with the cqlsh on a single-node of Cassandra. For example we can create a keyspace with replication factor of 2 and load some data into it:

cqlsh> create keyspace test with replication = {'class': 'SimpleStrategy', 'replication_factor':2};
cqlsh> create table test.keys (key int primary key, val int);
cqlsh> insert into test.keys (key, val) values (1, 1);
cqlsh> select * from test.keys;

 key | val
-----+-----
   1 |   1 

Everything works fine because the default consistency level is ONE, so only 1 node had to be online. Now try the same but setting it to ALL:

cqlsh> CONSISTENCY ALL;
Consistency level set to ALL.
cqlsh> insert into test.keys (key, val) values (2, 2);
Traceback (most recent call last):
  File "resources/cassandra/bin/cqlsh.py", line 1324, in perform_simple_statement
    result = future.result()
  File "resources/cassandra/bin/../lib/cassandra-driver.zip/cassandra-driver/cassandra/cluster.py", line 3133, in result
    raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ALL" info={'required_replicas': 2, 'alive_replicas': 1, 'consistency': 'ALL'}

cqlsh> select * from test.keys;
Traceback (most recent call last):
  File "resources/cassandra/bin/cqlsh.py", line 1324, in perform_simple_statement
    result = future.result()
  File "resources/cassandra/bin/../lib/cassandra-driver.zip/cassandra-driver/cassandra/cluster.py", line 3133, in result
    raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ALL" info={'required_replicas': 2, 'alive_replicas': 1, 'consistency': 'ALL'}

Neither reads nor writes will work because the 2nd node doesn't exist. In fact the error message will give a helpful clue that two replicas were needed but only one was available.

Once you have an understanding using cqlsh, you can apply the same using the Java drivers, depending on what your application needs.

like image 66
BrianC Avatar answered Oct 14 '22 00:10

BrianC


The reason you shouldn't set this as higher value than the number of nodes as Cassandra would achieve higher consistency when write replica and read replica count is greater than replication factor.

For instance if you have 5 nodes, and you have set the replication factor to 5. Now if 1 node goes down, you won't have high consistency due to which you have lost the advantage of Cassandra's availability.

After you add the nodes you could possibly increase the factor intelligently as the consistency level never allows you to write more than the number of nodes specified by the replication factor.

like image 31
Rishi Avatar answered Oct 14 '22 01:10

Rishi