Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to migrate single-token cluster to a new vnodes cluster without downtime?

We have Cassandra cluster with single token per node, total 22 nodes, average load per node is 500Gb. It has SimpleStrategy for the main keyspace and SimpleSnitch.

We need to migrate all data to the new datacenter and shutdown the old one without a downtime. New cluster has 28 nodes. I want to have vnodes on it.

I'm thinking of the following process:

  1. Migrate the old cluster to vnodes
  2. Setup the new cluster with vnodes
  3. Add nodes from the new cluster to the old one and wait until it balances everything
  4. Switch clients to the new cluster
  5. Decommission nodes from the old cluster one by one

But there are a lot of technical details. First of all, should I shuffle the old cluster after vnodes migration? Then, what is the best way to switch to NetworkTopologyStrategy and to GossipingPropertyFileSnitch? I want to switch to NetworkTopologyStrategy because new cluster has 2 different racks with separate power/network switches.

like image 318
relgames Avatar asked Mar 12 '13 13:03

relgames


1 Answers

should I shuffle the old cluster after vnodes migration?

You don't need to. If you go from one token per node to 256 (the default), each node will split its range into 256 adjacent, equally sized ranges. This doesn't affect where data lives. But it means that when you bootstrap in a new node in the new DC it will remain balanced throughout the process.

what is the best way to switch to NetworkTopologyStrategy and to GossipingPropertyFileSnitch?

The difficulty is that switching replication strategy is in general not safe since data would need to be moved around the cluster. NetworkToplogyStrategy (NTS) will place data on different nodes if you tell it nodes are in different racks. For this reason, you should move to NTS before adding the new nodes.

Here is a method to do this, after you have upgraded the old cluster to vnodes (your step 1 above):

1a. List all existing nodes as being in DC0 in the properties file. List the new nodes as being in DC1 and their correct racks.

1b. Change the replication strategy to NTS with options DC0:3 (or whatever your current replication factor is) and DC1:0.

Then to add the new nodes, follow the process here: http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-a-data-center-to-a-cluster. Remember to set the number of tokens to 256 since it will be 1 by default.

In step 5, you should set the replication factor for DC0 to be 0 i.e. change replication options to DC0:0, DC1:3. Now those nodes aren't being used so decommission won't stream any data but you should still do it rather than powering them off so they are removed from the ring.

Note one risk is that writes made at a low consistency level to the old nodes could get lost. To guard against this, you could write at CL.LOCAL_QUORUM after you switch to the new DC. There is still a small window where writes could get lost (between steps 3 and 4). If it is important, you can run repair before decommissioning the old nodes to guarantee no losses or write at a high consistency level.

like image 197
Richard Avatar answered Oct 12 '22 23:10

Richard