Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deploy zookeeper across multiple data centers and failover?

I would like to know about the existing approaches that are available when running Zookeeper across data centers?

One approach that I found after doing some research is to have observers. That approach is to have only one ensemble in the main data center with leader and follower. And having observers in the backup data center. When main datacenter crash, we select other datacenter as the new main data center and convert observers to leader/follower manually.

I would like to about better approaches to achieve the same.

Thanks

like image 311
Tharindu Kumara Avatar asked Jan 19 '17 09:01

Tharindu Kumara


1 Answers

First I would like to point the cons of your solution which hopefully my solution would solve:

a) in case of main data center failure the recovery process is manual (I quote you: "convert observers to leader/follower manually")
b) only the main data center accepts writes -> in case of failure all data (when observer don't write logs) or only last updates (when observer do write logs) are lost

Because the question is about data centerS I'll consider that we have enough (DCs) to reach our objective: solving a. and b. while having an usable multi data center distributed ZK.

So, when having an even number of data centers (DC) one could use an additional DC only for getting an odd number of ZK nodes in the ensemble. When having e.g. 2 DCs than a 3rd one could be added; each DC could contain 1 rwZK (read-write ZK node) or, for better tolerance against failures, each DC could contain 3 rwZK organized as hierarchical quorums (both cases could benefit of ZK observers). Inside a DC all ZK clients should point only to the DC's ZK-group so the traffic remained between DCs would be only for e.g. leader election, writes. With this kind of setup one solves both a. and b. but loses write/recovery-performance because the writes/elections must be agreed between data centers: at least 2 DCs must agree on writes/elections with 2 ZK nodes agreement per DC (see hierarchical quorums). The intra-DC agreement should be fast enough hence won't matter much for the overall write agreement process; bottom line, approximately only the delay between DCs would matter. The disadvantages of this approach are:
- additional cost for the 3rd data center: this could be mitigated by using the company office (a guy did that) as the 3rd data center
- lost sessions because of inter-DC network latency and/or throughput: with high enough timeouts one could reach a maximum possible write-throughput (depending on inter-DC average network speed) so this solution would be valid only when that maximum is acceptable. Still, when using 1 rw-ZK per DC I guess there'll be not much difference to your solution because the writes from backup DC to main DC must travel between DCs too; but for your solution won't be inter-DCs write agreements or leader elections related communication so it's faster.

Other consideration:

Regardless of the chosen solution the inter-DCs communication should be secured and for this ZK offers no solution so tunneling or other approach must be implemented.

UPDATE

Another solution would be to still use an additional 3rd DC (or company office) but where to keep only the rw-ZKs (1, 3 or other odd number) while the other 2 DCs to only have observer-ZKs. The clients should still connect only to the DC's ZK servers but we no longer need hierarchical quorums. The gain here is that the write agreements and leader elections would be only inside the DC with rw-ZKs (let's call it arbiter DC). The disadvantages are:
- the arbiter DC is a single point of failure
- the write requests will still have to travel from observer DCs to arbiter DC

like image 72
adrhc Avatar answered Oct 12 '22 21:10

adrhc