I want to described the following case that was on one of our production cluster We have ambari cluster with HDP version 2.6.4 Cluster include 3 kafka machines – while each kafka have disk with 5 T What we saw is that all kafka disks was with 100% size , so kafka disk was full and this is the reason that all kafka brokers was failed <pre class="prettyprint"><code>df -h /kafka Filesystem Size Used Avail Use% Mounted on /dev/sdb 5T 5T 23M 100% /var/kafka </code></pre> After investigation we saw that <code>log.retention.hours=7 days</code> So seems that purging is after 7 days and maybe this is the reason that kafka disks are full with 100% even if they are huge – 5T What we want to do now – is how to avoid this case in the future? So We want to know – how to avoid full used capacity on kafka disks What we need to set in Kafka config in order to purge the kafka disk according to the disk size – is it possible ? And how to know the right value of <code>log.retention.hours</code> ? according to the disk size or other?

In Kafka, there are two types of log retention; size and time retention. The former is triggered by <code>log.retention.bytes</code> while the latter by <code>log.retention.hours</code>. In your case, you should pay attention to size retention that sometimes can be quite tricky to configure. Assuming that you want a <code>delete</code> cleanup policy, you'd need to configure the following parameters to <pre class="prettyprint"><code>log.cleaner.enable=true log.cleanup.policy=delete </code></pre> Then you need to think about the configuration of <code>log.retention.bytes</code>, <code>log.segment.bytes</code> and <code>log.retention.check.interval.ms</code>. To do so, you have to take into consideration the following factors: <ul> <li><code>log.retention.bytes</code> is a minimum guarantee for a single partition of a topic, meaning that if you set <code>log.retention.bytes</code> to 512MB, it means you will always have 512MB of data (per partition) in your disk.</li> <li>Again, if you set <code>log.retention.bytes</code> to 512MB and <code>log.retention.check.interval.ms</code> to 5 minutes (which is the default value) at any given time, you will have at least 512MB of data + the size of data produced within the 5 minute window, before the retention policy is triggered. </li> <li>A topic log on disk, is made up of segments. The segment size is dependent to <code>log.segment.bytes</code> parameter. For <code>log.retention.bytes=1GB</code> and <code>log.segment.bytes=512MB</code>, you will always have up to 3 segments on the disk (2 segments which reach the retention and the 3rd one will be the active segment where data is currently written to). </li> </ul> Finally, you should do the math and compute the maximum size that might be reserved by Kafka logs at any given time on your disk and tune the aforementioned parameters accordingly. Of course, I would also advice to set a time retention policy as well and configure <code>log.retention.hours</code> accordingly. If after 2 days you don't need your data anymore, then set <code>log.retention.hours=48</code>.

kafka + how to avoid running out of disk storage

Tags:

apache-kafka

I want to described the following case that was on one of our production cluster

We have ambari cluster with HDP version 2.6.4

Cluster include 3 kafka machines – while each kafka have disk with 5 T

What we saw is that all kafka disks was with 100% size , so kafka disk was full and this is the reason that all kafka brokers was failed

df -h /kafka
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb         5T   5T   23M   100% /var/kafka

After investigation we saw that log.retention.hours=7 days

So seems that purging is after 7 days and maybe this is the reason that kafka disks are full with 100% even if they are huge – 5T

What we want to do now – is how to avoid this case in the future?

We want to know – how to avoid full used capacity on kafka disks

What we need to set in Kafka config in order to purge the kafka disk according to the disk size – is it possible ?

And how to know the right value of log.retention.hours ? according to the disk size or other?

267

asked Oct 24 '18 13:10

Judy

1 Answers

In Kafka, there are two types of log retention; size and time retention. The former is triggered by log.retention.bytes while the latter by log.retention.hours.

In your case, you should pay attention to size retention that sometimes can be quite tricky to configure. Assuming that you want a delete cleanup policy, you'd need to configure the following parameters to

log.cleaner.enable=true
log.cleanup.policy=delete

Then you need to think about the configuration of log.retention.bytes, log.segment.bytes and log.retention.check.interval.ms. To do so, you have to take into consideration the following factors:

log.retention.bytes is a minimum guarantee for a single partition of a topic, meaning that if you set log.retention.bytes to 512MB, it means you will always have 512MB of data (per partition) in your disk.
Again, if you set log.retention.bytes to 512MB and log.retention.check.interval.ms to 5 minutes (which is the default value) at any given time, you will have at least 512MB of data + the size of data produced within the 5 minute window, before the retention policy is triggered.
A topic log on disk, is made up of segments. The segment size is dependent to log.segment.bytes parameter. For log.retention.bytes=1GB and log.segment.bytes=512MB, you will always have up to 3 segments on the disk (2 segments which reach the retention and the 3rd one will be the active segment where data is currently written to).

Finally, you should do the math and compute the maximum size that might be reserved by Kafka logs at any given time on your disk and tune the aforementioned parameters accordingly. Of course, I would also advice to set a time retention policy as well and configure log.retention.hours accordingly. If after 2 days you don't need your data anymore, then set log.retention.hours=48.

119

answered Oct 02 '22 19:10

Giorgos Myrianthous

Related questions
                            
                                Kafka Consumer outputs excessive DEBUG statements to console (ecilpse)
                            
                                Kafka Rebalancing. Duplicate processing issue
                            
                                kafka AdminClient API Timed out waiting for node assignment
                            
                                How can I handle IOException when Kafka is down?
                            
                                Dynamically connecting a Kafka input stream to multiple output streams
                            
                                Dead letter queue (DLQ) for Kafka with spring-kafka
                            
                                org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
                            
                                How to find the root cause of high CPU usage of Kafka brokers?
                            
                                Use schema to convert AVRO messages with Spark to DataFrame
                            
                                Kafka Rebalancing and listeners pitfalls
                            
                                Spring-Kafka Concurrency Property
                            
                                Spark Streaming + Kafka: SparkException: Couldn't find leader offsets for Set
                            
                                How to use Android App as a "Producing client" for Kafka?
                            
                                How to read records in JSON format from Kafka using Structured Streaming?
                            
                                How to save latest offset that Spark consumed to ZK or Kafka and can read back after restart
                            
                                Apache Kafka example error: Failed to send message after 3 tries
                            
                                Apache Kafka - linger.ms and batch.size settings
                            
                                KafkaAvroDeserializer does not return SpecificRecord but returns GenericRecord
                            
                                consumer.How to specify partition to read? [kafka]
                            
                                How to restart kafka server properly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With