Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unbalanced Cassandra cluster

Update - Short version:
The PropertyFileSnitch cassandra-topology.properties for the first 3 nodes (Rack 1-3) states that only these nodes are in DC1 and the others are in DC2 by specifying the default value default=DC2:r1. When the cluster was scaled up by adding nodes 4 and 5 the PropertyFileSnitch for these nodes was configured to add them in DC1 as well in Rack 4 and 5 but the snitch from the first 3 nodes remained unchanged and as a result the cluster is in this inconsistent state.

My question is if this cluster can be rebalanced (fixed). Would it suffice if I did a full cluster restart after fixing the cassandra-topology.properties?
Please advise on how I can safely rebalance the cluster.

Longer version:

I am new to Cassandra and I started working on an already built cluster.
I have 5 nodes in the same data center on different racks running Cassandra version 3.0.5 with vnodes num_tokens: 256 and a keyspace with replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = true.
Historically there were only 3 nodes and the cluster was scaled up with an additional 2 nodes. I have an automatic repair script that runs nodetool repair with options parallelism: parallel, primary range: false, incremental: true, job threads: 1.

After a large amount of data was inserted the problems started to appear. When running the repair script on node 4 or 5 the node 2 gets overloaded: the CPU usage stays at 100%, the MutationStage queue grows and the GC pauses take at least 1s until the Cassandra process finally dies. The repair result is usually failed with error Stream failed (progress: 0%).

When running the nodetool status command on nodes 1, 2 or 3 I get the following output:

Datacenter: DC2
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns (effective)  Host ID    Rack
UN  10.0.0.13    10.68 GB   256     0.0%              75e17b8a   r1
UN  10.0.0.14    9.43 GB    256     0.0%              21678ddb   r1
Datacenter: DC1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns (effective)  Host ID    Rack
UN  10.0.0.10    16.14 GB   256     100.0%            cf9d327f   Rack1
UN  10.0.0.11    22.83 GB   256     100.0%            e725441e   Rack2
UN  10.0.0.12    19.66 GB   256     100.0%            95b5c8e3   Rack3

But when running the nodetool status command on nodes 4 or 5 I get the following output:

Datacenter: DC1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID    Rack
UN  10.0.0.13   10.68 GB   256     58.9%             75e17b8a   Rack4
UN  10.0.0.14   9.43 GB    256     61.1%             21678ddb   Rack5
UN  10.0.0.10   16.14 GB   256     60.3%             cf9d327f   Rack1
UN  10.0.0.11   22.83 GB   256     61.4%             e725441e   Rack2
UN  10.0.0.12   19.66 GB   256     58.3%             95b5c8e3   Rack3

After further investigation it seems that the PropertyFileSnitch cassandra-topology.properties was not updated on nodes 1, 2 and 3 (which are also the seeds for this cluster) after the cluster was scaled up.

Thanks!

like image 222
alien5 Avatar asked Nov 08 '22 00:11

alien5


1 Answers

After searching in several online resources I found some possible solutions. I'll post them here so it will be accessible for everyone.

From Practical Cassandra: A Developer's Approach:

Ring View Differs between Nodes
When the ring view differs between nodes, it is never a good thing. There is also no easy way to recover from this state. The only way to recover is to do a full cluster restart. A rolling restart won’t work because the Gossip protocol from the bad nodes will inform the newly booting good nodes of the bad state. A full cluster restart and bringing the good nodes up first should enable the cluster to come back up in a good state.

The same solution can be found also in DataStax docs: View of ring differs between some nodes

I also found a similar question on Apache Cassandra Community. The answer on the community users thread is:

What has happened is that you have now two datacenters in your cluster. The way they replicate information will depend on your keyspace settings. Regarding your process I don't think it is safe to do it that way. I'd start off by decommissioning nodes 4 and 5 so that your cluster is back to 1 datacenter with 3 nodes and then add them sequentially again making sure the configuration in the Snitch is the proper one.

like image 157
alien5 Avatar answered Nov 15 '22 14:11

alien5