Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Backup/restore kafka and zookeeper

I am running a simple 3 node of kafka and 5 node of zookeeper to run the kafka, I would like to know which is the good way of backup my kafka, same for my zookeeper.

For the moment I just export my data directory to a s3 bucket...

Thanks.

like image 201
starttter Avatar asked Dec 13 '17 10:12

starttter


People also ask

Do you backup Kafka?

Kafka Backup is a tool to back up and restore your Kafka data including all (configurable) topic data and especially also consumer group offsets. To the best of our knowledge, Kafka Backup is the only viable solution to take a cold backup of your Kafka data and restore it correctly.

Is ZooKeeper still used in Kafka?

Usually, Kafka uses Zookeeper to store and manage all the metadata information about Kafka clusters. Kafka also uses Zookeeper as a centralized controller that manages and organizes all the Kafka brokers or servers.

Is ZooKeeper deprecated in Kafka?

ZooKeeper would be deprecated in the release after that, and removed in Kafka 4.0. Targeted for August, Kafka 3.3 would include options for both ZooKeeper and KRaft. The end-of-life date for ZooKeeper is undetermined.

What is the relationship between Kafka and ZooKeeper?

Kafka uses Zookeeper to manage service discovery for Kafka Brokers that form the cluster. Zookeeper sends changes of the topology to Kafka, so each node in the cluster knows when a new broker joined, a Broker died, a topic was removed or a topic was added, etc.


1 Answers

Zalando has recently published pretty good article how to backup Kafka and Zookeeper. Generally there are 2 paths for Kafka backup:

  • Maintain second Kafka cluster, to which all topics get replicated. I haven't verified this setup, but if offset topics are also replicated, then switching to another cluster shouldn't harm consumers' processing state.
  • Dump topics to cloud storage, e.g. using S3 connector (as described by Zalando). In case of restore, you recreate topics and feed it with data from your cloud storage. This would allow you to make point-in-time restore, but consumers would have to start reading from topic from the beginning.

The preferred backup solution will depend on your use case. E.g. for streaming applications, first solution may give you less pain, while when using Kafka for event sourcing, the second solution may be more desirable.

Regarding Zookeeper, Kafka keeps there information about topics (persistent store), as well as for broker discovery and leader election (ephemeral). Zalando settled on using Burry, which simply iterates over Zookeeper tree structure, dumps it to file structure, to later zip it and push to cloud storage. It suffers from a little problem, but most probably it does not impact backup of Kafka's persistent data (TODO verify). Zalando describes there, that when restoring, it is better to first create Zookeeper cluster, then connect a new Kafka cluster to it (with new, unique broker IDs), and then restore Burry's backup. Burry will not overwrite existing nodes, not putting ephemeral information about old brokers, what is stored in backup.

Note: Although they mention usage of Exhibitor, it is not really needed for backup when backing up with Burry.

like image 154
krzychu Avatar answered Sep 18 '22 13:09

krzychu