Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

kafka Multi-Datacenter with high availability

Tags:

apache-kafka

I'm setting up 2 kafka v0.10.1.0 clusters on different DCs and planning to use mirror-maker to keep one as source and the other one as target, what I'm not sure is how to ensure high availability when my source/main cluster goes down (complete DC where source kafka cluster goes down) do I need to make my application switch to produce messages to the target kafka and what will happen when source kafka is back? how to bring it back in sync with the possible lost messages?

Thanks

like image 755
Al Elizalde Avatar asked Mar 10 '23 18:03

Al Elizalde


1 Answers

From reading your question I don't think, that MirrorMaker will be a suitable tool for your needs I am afraid.

Basically MirrorMaker is simply a Consumer and a Producer tied together to replicate messages from one cluster to another. It is not a tool to tie two Kafka clusters together in an active-active configuration, which sounds a lot like what you are looking for.

But to answer your questions in order:

Do I need to make my application switch to produce messages to the target kafka?

Yes, there is currently no failover function, you would need to implement logic in your producers to try the target cluster after x amount of failed messages or no messages sent in y minutes or something like that.

What will happen when source kafka is back?

Pretty much nothing that you don't implement yourself :) MirrorMaker will start replicating data from your source cluster to your target cluster again, but since your producers now switched over to the target cluster, the source cluster is not getting any data, so they will idle along. Your producers will keep producing into the target cluster, unless you implemented a regular check whether the source came back online and have them switch back.

How to bring it back in sync with the possible lost messages?

When your source cluster is back online and assuming all the things I mentioned above have happened you effectively switched your clusters around, depending on whether you want your source as primary cluster that gets written to or are happy to reverse roles when this happens you have two options that I can come up with off the top of my head:

  • reverse the direction of mirrormaker and set the consumer group offsets manually so that it picks up at the point where the source cluster died
  • stop producing new data for a while, recover missing data to the source cluster, switch back your producers and start everything up again.

Both options require you to figure out, what data is missing on the source cluster manually though, I don't think there is a way around this.


Bottom line is, that this in not an easy thing to do with MirrorMaker and it might be worth having another think about whether you really want to switch producers over to the target cluster if the source goes down.

You could also have a look at Confluent's Replicator, which might better suit what you are looking for and is part of their corporate offering. Information is a bit sparse on that, let me know if you are interested in it and I can make an introduction to someone who can tell you more about it (or of course just send a mail to Confluent, that'll reach the right person as well).

like image 104
Sönke Liebau Avatar answered Apr 09 '23 09:04

Sönke Liebau