I am using the datastax java driver for Apache Cassandra (v. 2.1.9) and I am wondering what should happen when I set replication_factor greater than number of nodes. I've read somewhere that Cassandra allows for this operation, but should fail when I will try to save some data (of course it depends on the write consistency level, but I mean the case of ALL).
The problem is that everything works, no exception is being thrown, even if I try to save data. Why?
Maybe the pieces of information which I've read were old, for older versions of Cassandra?
One more question, whether it's true, than what would happen when I add another node to the cluster?
In a production system with three or more Cassandra nodes in each data center, the default replication factor for an Edge keyspace is three. As a general rule, the replication factor should not exceed the number of Cassandra nodes in the cluster.
If you want to change the replication factor of a keyspace, you can do it by executing the ALTER KEYSPACE command, which has the following syntax: Syntax: ALTER KEYSPACE "KeySpace Name" WITH replication = {'class': 'Strategy name', 'replication_factor' : 'No. Of replicas'};
Can I change the replication factor (a a keyspace) on a live cluster? Yes, but it will require running a full repair (or cleanup) to change the replica count of existing data: Alter <alter-keyspace-statement> the replication factor for desired keyspace (using cqlsh for instance).
As we said earlier, each instance of Cassandra has evolved to contain 256 virtual nodes. The Cassandra server runs core processes. For example, processes like spreading replicas around nodes or routing requests.
For example, if you increase the number of Cassandra nodes to six, but leave the replication factor at three, you do not ensure that all Cassandra nodes have a copy of all the data. If a node goes down, a higher replication factor means a higher probability that the data on the node exists on one of the remaining nodes.
If you add additional Cassandra nodes to the cluster, the default replication factor is not affected. For example, if you increase the number of Cassandra nodes to six, but leave the replication factor at three, you do not ensure that all Cassandra nodes have a copy of all the data.
Use the following procedure to view the Cassandra schema, which shows the replication factor for each Edge keyspace: 1 Log in to a Cassandra node. 2 Run the following command:#N#> /opt/apigee/apigee-cassandra/bin/cassandra-cli -h $ (hostname -i) <<< "show... More ...
Cassandra achieves high availability and fault tolerance by replication of the data across nodes in a cluster. The replication strategy determines where replicas are stored in the cluster. Each node in the cluster owns not only the data within an assigned token range but also the replica for a different range of data.
Cassandra has a concept of "tunable consistency" which in part means you can control the consistency level setting for read/write operations.
You can read a bit more in the docs explaining consistency levels and how to set them in the cqlsh shell.
To learn more I suggest experimenting with the cqlsh on a single-node of Cassandra. For example we can create a keyspace with replication factor of 2 and load some data into it:
cqlsh> create keyspace test with replication = {'class': 'SimpleStrategy', 'replication_factor':2};
cqlsh> create table test.keys (key int primary key, val int);
cqlsh> insert into test.keys (key, val) values (1, 1);
cqlsh> select * from test.keys;
key | val
-----+-----
1 | 1
Everything works fine because the default consistency level is ONE, so only 1 node had to be online. Now try the same but setting it to ALL:
cqlsh> CONSISTENCY ALL;
Consistency level set to ALL.
cqlsh> insert into test.keys (key, val) values (2, 2);
Traceback (most recent call last):
File "resources/cassandra/bin/cqlsh.py", line 1324, in perform_simple_statement
result = future.result()
File "resources/cassandra/bin/../lib/cassandra-driver.zip/cassandra-driver/cassandra/cluster.py", line 3133, in result
raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ALL" info={'required_replicas': 2, 'alive_replicas': 1, 'consistency': 'ALL'}
cqlsh> select * from test.keys;
Traceback (most recent call last):
File "resources/cassandra/bin/cqlsh.py", line 1324, in perform_simple_statement
result = future.result()
File "resources/cassandra/bin/../lib/cassandra-driver.zip/cassandra-driver/cassandra/cluster.py", line 3133, in result
raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ALL" info={'required_replicas': 2, 'alive_replicas': 1, 'consistency': 'ALL'}
Neither reads nor writes will work because the 2nd node doesn't exist. In fact the error message will give a helpful clue that two replicas were needed but only one was available.
Once you have an understanding using cqlsh, you can apply the same using the Java drivers, depending on what your application needs.
The reason you shouldn't set this as higher value than the number of nodes as Cassandra would achieve higher consistency when write replica and read replica count is greater than replication factor.
For instance if you have 5 nodes, and you have set the replication factor to 5. Now if 1 node goes down, you won't have high consistency due to which you have lost the advantage of Cassandra's availability.
After you add the nodes you could possibly increase the factor intelligently as the consistency level never allows you to write more than the number of nodes specified by the replication factor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With