I am running a simple 3 node of kafka
and 5 node of zookeeper
to run the kafka
, I would like to know which is the good way of backup my kafka
, same for my zookeeper
.
For the moment I just export my data directory to a s3 bucket...
Thanks.
Kafka Backup is a tool to back up and restore your Kafka data including all (configurable) topic data and especially also consumer group offsets. To the best of our knowledge, Kafka Backup is the only viable solution to take a cold backup of your Kafka data and restore it correctly.
Usually, Kafka uses Zookeeper to store and manage all the metadata information about Kafka clusters. Kafka also uses Zookeeper as a centralized controller that manages and organizes all the Kafka brokers or servers.
ZooKeeper would be deprecated in the release after that, and removed in Kafka 4.0. Targeted for August, Kafka 3.3 would include options for both ZooKeeper and KRaft. The end-of-life date for ZooKeeper is undetermined.
Kafka uses Zookeeper to manage service discovery for Kafka Brokers that form the cluster. Zookeeper sends changes of the topology to Kafka, so each node in the cluster knows when a new broker joined, a Broker died, a topic was removed or a topic was added, etc.
Zalando has recently published pretty good article how to backup Kafka and Zookeeper. Generally there are 2 paths for Kafka backup:
The preferred backup solution will depend on your use case. E.g. for streaming applications, first solution may give you less pain, while when using Kafka for event sourcing, the second solution may be more desirable.
Regarding Zookeeper, Kafka keeps there information about topics (persistent store), as well as for broker discovery and leader election (ephemeral). Zalando settled on using Burry, which simply iterates over Zookeeper tree structure, dumps it to file structure, to later zip it and push to cloud storage. It suffers from a little problem, but most probably it does not impact backup of Kafka's persistent data (TODO verify). Zalando describes there, that when restoring, it is better to first create Zookeeper cluster, then connect a new Kafka cluster to it (with new, unique broker IDs), and then restore Burry's backup. Burry will not overwrite existing nodes, not putting ephemeral information about old brokers, what is stored in backup.
Note: Although they mention usage of Exhibitor, it is not really needed for backup when backing up with Burry.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With