I am running a simple 3 node of <code>kafka</code> and 5 node of <code>zookeeper</code> to run the <code>kafka</code>, I would like to know which is the good way of backup my <code>kafka</code>, same for my <code>zookeeper</code>. For the moment I just export my data directory to a s3 bucket... Thanks.

Zalando has recently published pretty good article how to backup Kafka and Zookeeper. Generally there are 2 paths for Kafka backup: <ul> <li>Maintain second Kafka cluster, to which all topics get replicated. I haven't verified this setup, but if offset topics are also replicated, then switching to another cluster shouldn't harm consumers' processing state.</li> <li>Dump topics to cloud storage, e.g. using S3 connector (as described by Zalando). In case of restore, you recreate topics and feed it with data from your cloud storage. This would allow you to make point-in-time restore, but consumers would have to start reading from topic from the beginning.</li> </ul> The preferred backup solution will depend on your use case. E.g. for streaming applications, first solution may give you less pain, while when using Kafka for event sourcing, the second solution may be more desirable. Regarding Zookeeper, Kafka keeps there information about topics (persistent store), as well as for broker discovery and leader election (ephemeral). Zalando settled on using Burry, which simply iterates over Zookeeper tree structure, dumps it to file structure, to later zip it and push to cloud storage. It suffers from a little problem, but most probably it does not impact backup of Kafka's persistent data (TODO verify). Zalando describes there, that when restoring, it is better to first create Zookeeper cluster, then connect a new Kafka cluster to it (with new, unique broker IDs), and then restore Burry's backup. Burry will not overwrite existing nodes, not putting ephemeral information about old brokers, what is stored in backup. Note: Although they mention usage of Exhibitor, it is not really needed for backup when backing up with Burry.

Backup/restore kafka and zookeeper

1 Answers

Zalando has recently published pretty good article how to backup Kafka and Zookeeper. Generally there are 2 paths for Kafka backup:

Maintain second Kafka cluster, to which all topics get replicated. I haven't verified this setup, but if offset topics are also replicated, then switching to another cluster shouldn't harm consumers' processing state.
Dump topics to cloud storage, e.g. using S3 connector (as described by Zalando). In case of restore, you recreate topics and feed it with data from your cloud storage. This would allow you to make point-in-time restore, but consumers would have to start reading from topic from the beginning.

The preferred backup solution will depend on your use case. E.g. for streaming applications, first solution may give you less pain, while when using Kafka for event sourcing, the second solution may be more desirable.

Regarding Zookeeper, Kafka keeps there information about topics (persistent store), as well as for broker discovery and leader election (ephemeral). Zalando settled on using Burry, which simply iterates over Zookeeper tree structure, dumps it to file structure, to later zip it and push to cloud storage. It suffers from a little problem, but most probably it does not impact backup of Kafka's persistent data (TODO verify). Zalando describes there, that when restoring, it is better to first create Zookeeper cluster, then connect a new Kafka cluster to it (with new, unique broker IDs), and then restore Burry's backup. Burry will not overwrite existing nodes, not putting ephemeral information about old brokers, what is stored in backup.

Note: Although they mention usage of Exhibitor, it is not really needed for backup when backing up with Burry.

154

answered Sep 18 '22 13:09

krzychu

Related questions
                            
                                How to Consume from specific TopicPartitionOffset with Confluent.Kafka in .Net
                            
                                Faust example of publishing to a kafka topic
                            
                                Kafka - Broker: Message size too large
                            
                                What is most efficient way to write from kafka to hdfs with files partitioning into dates
                            
                                kafka Synchronization :"java.io.IOException: Too many open files"
                            
                                Kafka to zookeeper command produces error
                            
                                Where is Apache Kafka placed in the PACELC-Theorem
                            
                                Build a multi node Kafka cluster on docker swarm
                            
                                Spring Kafka Auto Commit Offset In Case of Failures
                            
                                Kafka not able to connect with zookeeper with error "Timed out waiting for connection while in state: CONNECTING"
                            
                                Can I set Kafka Stream consumer group.id?
                            
                                Get the latest offsets in SSL Enabled Kafka via CMD
                            
                                "The $changeStream stage is only supported on replica sets" error while using mongodb-source-connect
                            
                                How to use Spark Structured Streaming with Kafka Direct Stream?
                            
                                How to check consumer offsets when the offset store is Kafka?
                            
                                Kafka optimal retention and deletion policy
                            
                                Apache kafka 2.0.0 version - Connection to node 1 failed authentication due to: SSL handshake
                            
                                Update message in Kafka topic
                            
                                AccessDeniedException when deleting a topic on Windows Kafka
                            
                                Terminate Kafka Console Consumer when all the messages have been read

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Backup/restore kafka and zookeeper

Tags:

backup

restore

apache-kafka

apache-zookeeper

starttter

People also ask

1 Answers

krzychu

Recent Activity

Donate For Us