Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scylla datacenter and Cassandra datacenter in same cluster

I have running 21 node Cassandra cluster with 150+ schema and about 20 TB data.I need to shift the schema and data from Cassandra to 7 node Scylla in no downtime scenario.

Both Scylla and Cassandra support the same cqlsh version and almost same in distributing the data and gossiping.

To shift the data I am trying to create new Scylla datacenter in existing Cassandra cluster and update the Keyspace topology to have Scylla also in the DC list of replication then Bootstrap/Rebuild the Scylla node in the cluster.

To do this I am getting error of TCP connection failure when adding seed list in node.

Scylla Error:-

scylla: [shard 0] rpc - client 10.200.1.2:34236: server connection dropped: connection is closed scylla: [shard 0] rpc - client 10.200.1.2:7000: fail to connect: Connection refused.

Cassandra Error:-

[MessagingService-Outgoing-/10.200.2.2-Gossip] OutboundTcpConnection.java:411 - Socket to /10.200.2.2 closed [HANDSHAKE-/10.200.2.2] OutboundTcpConnection.java:570 - Cannot handshake version with /10.200.2.2 [HANDSHAKE-/10.200.2.2] OutboundTcpConnection.java:561 - Handshaking version with /10.200.2.2

Please help me if anyone has done this already or any better idea of shifting data without downtime, without data loss in less risk.

like image 648
sachin Avatar asked Apr 14 '20 10:04

sachin


People also ask

What is the difference between cluster and datacenter?

A Cluster is a collection of Data Centers. A Data Center is a collection of Racks. A Rack is a collection of Servers. A Server contains 256 virtual nodes (or vnodes) by default.

How many nodes does Cassandra cluster have?

As we said earlier, each instance of Cassandra has evolved to contain 256 virtual nodes. The Cassandra server runs core processes. For example, processes like spreading replicas around nodes or routing requests.

What is local datacenter in Cassandra?

An Apache Cassandra Datacenter is a group of nodes, related and configured within a cluster for replication purposes. Setting up a specific set of related nodes into a datacenter helps to reduce latency, prevent transactions from impact by other workloads, and related effects.

What are Cassandra clusters?

A Cassandra cluster is a collection of nodes, or Cassandra instances, visualized as a ring. Cassandra clusters can be defined as “rack aware” or “datacenter aware” so that data replicas could be distributed in a way that could even survive physical outages of underlying infrastructure.

What is a datacenter in Cassandra?

Datacenters A datacenter is a logical set of racks. The datacenter should contain at least one rack. We can say that the Cassandra Datacenter is a group of nodes related and configured within a cluster for replication purposes. So, it helps to reduce latency, prevent transactions from impact by other workloads and related effects.

What is the hierarchy of a cluster in Cassandra?

The hierarchy of elements in Cassandra is: A Cluster is a collection of Data Centers. A Data Center is a collection of Racks. A Rack is a collection of Servers. A Server contains 256 virtual nodes (or vnodes) by default. A vnode is the data storage layer within a server.

How does Cassandra determine which nodes contain replicated data?

Cassandra has two strategies for determining which nodes contain replicated data. The first one is called the S impleStrategy, and it is unaware of the logical division of nodes for datacenters and racks. The second one is NetworkTopologyStrategy is more complicated and is both racks aware and datacenter aware.

What is the difference between Scylla and Cassandra CDC?

On Cassandra, by comparison, CDC is a commitlog-like structure you have to write custom programs to interact with and requires deduplication. Scylla supports global and local secondary indexes simultaneously (Cassandra supports only local secondary indexes).


2 Answers

You can not have an heterogeneous cluster with C* and Scylla nodes on the same cluster.

Create a separate scylla cluster, create the schema, change the app to do double writes (to both clusters) and then migrate the C* historical data to Scylla.

There are multiple ways to migrate the data. This should help: https://youtu.be/CDOesdWDT9Y No downtime, no problem there are options for that too.

like image 151
Moreno Garcia Avatar answered Oct 16 '22 10:10

Moreno Garcia


While Scylla is compatible with Cassandra across several axes (SSTables, CQL/Drivers, etc.), Scylla did need to make some changes to the gossip protocol which make it impossible to join a Cassandra cluster. There is no known way to join Scylla to a Cassandra cluster.

Scylla has published several suggested techniques for migration.

Blog describing the techniques: https://www.scylladb.com/2019/04/02/spark-file-transfer-and-more-strategies-for-migrating-data-to-and-from-a-cassandra-or-scylla-cluster/

Webinar walking through the migration techniques [requires registration]: https://go.scylladb.com/wbn-spark-scylla-migration-strategies-registration.html

Documentation: https://docs.scylladb.com/operating-scylla/procedures/cassandra_to_scylla_migration_process/

Community Slack for Q&A: http://slack.scylladb.com

like image 26
Greg Avatar answered Oct 16 '22 09:10

Greg