Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extending Cassandra cluster with datacenter in China (CGF)

I need to extend my cluster with a new datacenter to be present in China mainland, behind the Great Firewall. Currently I have datacenters in the US and Europe - so the cluster already matches to the requirements of the Geographical Location Scenario.

At this point I have the chinese infrastructure ready for Cassandra, but the network statistics from the past few days are bit troublesome and I am a bit afraid: if and how this can effect my current cluster and will be the new datacenter functional at all?

My actual questions regarding this are:

  • How does Cassandra handle huge packet-loss during replication? (occasionally up to 40%)
  • How does it effect the cluster when the network connection between two datacenters are really bad (only few kilobits/sec and latency as above) for hours?
    • Will the chinese dc considered as dead? Or Cassandra will still try to use the limited bandwidth?
    • Can this cause any problem on the non-chinese datacenters? e.g. they get slow, which results in client request timeouts.
  • Is it possible to enforce somehow, that only one of my non-chinese datacenter communicates with the chinese one? Or should I trust that Cassandra will handle this? (trying to avoid to possible harm all my datacenters)
  • Is there any way to fasten up the initial data replication (nodetool rebuild), because with the current speed it would take weeks to replicate our current data.

Any suggestion or remark is welcomed, thanks!

like image 870
Andrea Nagy Avatar asked Jul 23 '18 12:07

Andrea Nagy


1 Answers

How does Cassandra handle huge packet-loss during replication? (occasionally up to 40%)

Usually packet loss will cause a large number of read repairs. In some cases it can cause requests to fail depending on replication factor and consistency. Also, be prepared to have very costly repairs which will create a lot of tiny SSTables and a substansial amount of IO.

I would suggest to run a test on a development requirement to see the actual behavior in your system. There are plenty of tools to simulate bad network.

How does it effect the cluster when the network connection between two datacenters are really bad (only few kilobits/sec and latency as above) for hours? Will the chinese dc considered as dead? Or Cassandra will still try to use the limited bandwidth? Can this cause any problem on the non-chinese datacenters?

It largely depends on how bad and what consistency level/replication factor you are running with. In some cases it will just cause rather high latency between clusters. However, if the connection is bad enough that the nodes will start marking the other as down - Then you are looking at issues in all datacenters. Your existing datacenters will struggle with performance caused by requests timing out. This will in turn cause requests to be held longer in memory which can lead to GC. (It can cause a number of other issues in your other cluster as well)

The threshold on how sensitive the failure detector is can be adjusted and fine tuned to suit your use case. phi_convict_threshold is a setting that can decrease the likelihood of a node being marked as down. You can find more about that here. If you find that sweet spot where your nodes are not marked down due to being unresponsive, you can have Cassandra leverage what little it has to work with.

Is it possible to enforce somehow, that only one of my non-chinese datacenter communicates with the chinese one? Or should I trust that Cassandra will handle this? (trying to avoid to possible harm all my datacenters)

There is not really a way to tell Cassandra to limit what datacenters to speak to. You are kind of stuck with communicating between the datacenters you include in your replication factor.

Is there any way to fasten up the initial data replication (nodetool rebuild), because with the current speed it would take weeks to replicate our current data.

I would recommend against the solution of using sstableloader for it functions very similar as rebuild does and requires a snapshot to operate. If network is what is causing the slow speed, then changing the way of streaming is not going to make much difference.

In my opinion, the first thing to do would be to measure where the bottleneck is for your system. If the slow network is really the bottleneck, one could add more nodes to stream from more sources at the same time but ultimately you will still be hampered by the slow network connection.

like image 122
JayK Avatar answered Oct 16 '22 12:10

JayK