Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Achieving zero downtime Cassandra/DataStax migrations

I've got a Cassandra cluster (3 nodes, all nodes deployed to AWS) that I am trying to migrate over to a DataStax cluster. It's simply time to stop managing these nodes myself.

I have multiple producers and consumers all reading/writing data, all day long, to my Cassandra cluster. I don't have the option of putting an app/service/proxy in front of my Cassandra cluster, and then just flipping the switch cleanly so that all reads/writes go to/from my Cassandra, over to DataStax. So there's no clean way to migrate the tables one at a time. I'm also trying to achieve zero (or near zero) downtime for all producers/consumers of the data. One hard requirement: the migration cannot be lossy. No lost data!

I'm thinking the best strategy here is a four step process:

  1. Somehow, configure DataStax to be a replica of my Cassandra cluster, effectively creating streaming replication to DataStax
  2. Once DataStax is totally "caught up" with the other nodes in my Cassandra, keep the producers writing to my current Cassandra cluster, but cut the consumers/readers over to DataStax (that is, reconfigure them to connect to DataStax, and then restart them). Not zero downtime but I can probably live with a simple restart. (Again, zero downtime solutions are greatly preferred.)
  3. Cut the producers over to DataStax. Again, only near-zero-downtime, as this involves reconfiguring the producers to point to DataStax, and then requires a restart to pick up the new configs. Zero-downtime solutions would be preferred.
  4. Once replication traffic from the "old" Cassandra cluster drains to zero, we now have no "new" information that my non-DataStax nodes need to write to DataStax. Kill those nodes with fire.

This solution is the most minimally-invasive, closest-to-zero-downtime solution I can come up with, but assumes a few things:

  • Perhaps it is not possible to treat DataStax like an extra node that can be replicated to (yes/no?)
  • Perhaps Cassandra and/or DataStax have some magical features/capabilities that I don't know about, that can handle migrations better than this solution; or perhaps there are 3rd party (ideally open source) tools that could handle this better
  • I have no idea how I would monitor replication "traffic" coming from the "old" Cassandra nodes into DataStax. Would need to know how to do this before I could safely shutdown + kill the old nodes (again, can't lose data).

I guess I'm wondering if this strategy is: (1) doable/feasible, and (2) optimal; and if there are any features/tools in the Cassandra/DataStax ecosystem that I could leverage to make this any better (faster and with zero downtime).

like image 935
smeeb Avatar asked Jan 27 '17 19:01

smeeb


People also ask

What is the difference between Apache Cassandra and DataStax Cassandra?

Scale-out NoSQL for any workload Built on Apache Cassandra™, DataStax Enterprise adds NoSQL workloads including search, graph, and analytics, with operational reliability hardened by the largest internet apps and the Fortune 100.

Is DataStax the same as Cassandra?

DataStax Enterprise (DSE) is the always-on, scalable data platform built on Apache Cassandra and designed for hybrid Cloud.

What is the most important design decision in Cassandra?

With Cassandra, an important goal of the design is to optimize how data is distributed around the cluster. Sorting is a Design Decision: In Cassandra, sorting can be done only on the clustering columns specified in the PRIMARY KEY.

When should Cassandra be used?

If all your queries will be based on the same partition key, Cassandra is your best bet. If you get a query on an attribute that is not the partition key, Cassandra allows you to replicate the whole data with a new partition key. So now you have 2 replicas of the same data with 2 different partition keys.


2 Answers

The four steps you've outlined is definitely a viable option to go. There's also the route of doing a simple rolling binary install: https://docs.datastax.com/en/latest-upgrade/upgrade/datastax_enterprise/upgrdCstarToDSE.html

I'll speak in the context of the steps you provided above. If you're curious about the rolling binary install, we can definitely chat about that as well.

Note doc links are specific to Cassandra 3.0 (DataStax 5.0) - make sure the doc versions match your Cassandra version.

If the current major Cassandra version == current major Cassandra version in DataStax, you should be able to add the 'DataStax' nodes as a new DC in the same cluster your current Cassandra environment belongs to following: http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html - That will bring in the existing data from existing Cassandra DC to DataStax DC.

If you're mismatching Cassandra versions (current Cassandra is older/newer than DataStax Cassandra), then you may want to reach out to DataStax via https://academy.datastax.com/slack as the process will be more specific to your environment and can vary greatly.

As outlined in the docs, you'll want to run

ALTER KEYSPACE "your-keyspace" WITH REPLICATION =
{'class’: 'NetworkTopologyStrategy', 'OldCassandraDC':3, 'DataStaxDC':3};

(obviously changing DC name and replication factor to your specs)

This will make sure new data from your producers will replicate to the new DataStax nodes.

You can then run nodetool rebuild -- name_of_existing_data_center from the DataStax nodes to stream data over from the existing Cassandra nodes. Depending on how much data there is, it may be somewhat time consuming but it's the easiest, most hands off way to do it.

You would then want to update the contact points in your producers/consumers one by one before decommissioning the old Cassandra DC.

A few tips from my experience:

  • Make sure your DataStax nodes are using GosspingPropertyFileSnitch in the cassandra.yaml before starting those nodes.
  • When running nodetool rebuild, do it with screen so that you can see when it completes (or errors), Otherwise, you would have to monitor progress by using nodetool netstats and check streaming activity.
  • Have OpsCenter up and running to monitor what's going on in the DataStax cluster during the rebuilds. You can keep an eye on streaming throughput, pending compactions, and other Cassandra specific metrics.
  • When it comes time to decommission the old DC, make sure you follow these steps: http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsDecomissionDC.html

Hope that helps!

like image 121
MarcintheCloud Avatar answered Nov 15 '22 09:11

MarcintheCloud


I presume you mean the Datastax Managed product, where they run cassandra for you. If you just mean "run DSE on your own AWS instances", you can do a binary upgrade in-place.

The questions you asked are best asked of Datastax - if you're going to pay them, you may as well ask them questions (that's what customers do).

Your 4 step approach is mostly pretty logical, but probably overly complex. Most cassandra drivers will auto-discover new hosts, and auto-evict old/leaving hosts, so once you have all the new Datastax Managed nodes in the cluster (assuming they allow that), you can run repair to guarantee consistency, then decommission your existing nodes - your app will keep working (isn't Cassandra great?). You'll want to update your app config to have the new Datastax Managed nodes in your app config / endpoints, but that doesn't need to be done in advance.

The one caveat here is the latency involved - going from your environment to Datastax Managed may introduce latency. In that case, you have an intermediate step you can consider where you add the Datastax Managed nodes as a different "Datacenter" within cassandra, expand the replication factor, and use LOCAL_ consistency levels to control which DC gets the queries (and then you CAN move your producers/consumers over individually).

like image 30
Jeff Jirsa Avatar answered Nov 15 '22 10:11

Jeff Jirsa