One of the major role of log.retention.byte parameter is to avoid full size of the kafka disk , or in other words purging of data logs in order to avoid kafka disk full According to the following link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_kafka-component-guide/content/kafka-broker-settings.html log.retention.bytes – is The amount of data to retain in the log for each topic partition. By default, log size is unlimited. We can see also the Note - that this is the limit for each partition, so multiply this value by the number of partitions to calculate the total data retained for the topic. In order to understanding it well Let’s give little example ( hands-on is always much better) In kafka machine Under /var/kafka/kafka-logs we have the following topic partitions , while Topic name is - lop.avo.prt.prlop example of topics partitions under /var/kafka/kafka-logs <pre class="prettyprint"><code>lop.avo.prt.prlop-1 lop.avo.prt.prlop-2 lop.avo.prt.prlop-3 lop.avo.prt.prlop-4 lop.avo.prt.prlop-5 lop.avo.prt.prlop-6 lop.avo.prt.prlop-7 lop.avo.prt.prlop-8 lop.avo.prt.prlop-9 lop.avo.prt.prlop-10 </code></pre> and under each partition we have the following logs ( example ) <pre class="prettyprint"><code>4.0K 00000000000000023657.index 268K 00000000000000023657.log 4.0K 00000000000000023657.timeindex 4.0K 00000000000000023854.index 24K 00000000000000023854.log 4.0K 00000000000000023854.timeindex </code></pre> In the cluster we have 3 kafka machines ( 3 brokers ) About kafka storage – each kafka include disk with size of 100G let’s say that we want to purge the logs in the topic when disk comes to 70% from the total disk , so now let’s try to calculate the value of log.retention.bytes according to the above info because we have 10 topic partitions and the we want to limit the total size of the disk to 70G then my assumption is to do the calculate as the following each partition will limit to 7G and 7G translating to bytes , so it is  7516192768 bytes 7G X 10 = 70G ( 70% from the total disk ) So seems that log.retention.bytes should set to 7516192768 , in order to limit each partition to 7516192768 bytes Dose my assumption is logical? If not then what is the right calculation of - log.retention.bytes ? , based on that kafka disk is 100G , and we have only 10 topic partitions under /var/kafka/kafka-logs

You are on the right track. Just a couple of things to keep in mind: <ul> <li> <code>log.retention.bytes</code> defines how much data Kafka will ensure is available. So this is a lower bound limit. The maximum size on disk can be hard to exactly calculate as it depends on a number of settings like Segments and Indexes size, Segment roll time, cleaner interval (most <code>log.*</code> settings). See Kafka retention policies for some more details. Planning for 70% of total disk usage is a good idea but in practice I'd still recommend to monitor your disk usage to avoid surprizes. </li> <li> Based on your calculation, you are likely to require changes if you want to add partitions. Also note that replicas have to be counted, so if you create 1 new partitions with replication factor 3, 3 brokers will need to have the space available. </li> </ul>

kafka + how to calculate the value of log.retention.byte

Tags:

apache-kafka

One of the major role of log.retention.byte parameter is to avoid full size of the kafka disk , or in other words purging of data logs in order to avoid kafka disk full

According to the following link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_kafka-component-guide/content/kafka-broker-settings.html

log.retention.bytes – is The amount of data to retain in the log for each topic partition. By default, log size is unlimited.

We can see also the Note - that this is the limit for each partition, so multiply this value by the number of partitions to calculate the total data retained for the topic.

In order to understanding it well Let’s give little example ( hands-on is always much better)

In kafka machine Under /var/kafka/kafka-logs we have the following topic partitions , while Topic name is - lop.avo.prt.prlop

example of topics partitions under /var/kafka/kafka-logs

lop.avo.prt.prlop-1
lop.avo.prt.prlop-2
lop.avo.prt.prlop-3
lop.avo.prt.prlop-4
lop.avo.prt.prlop-5
lop.avo.prt.prlop-6
lop.avo.prt.prlop-7
lop.avo.prt.prlop-8
lop.avo.prt.prlop-9
lop.avo.prt.prlop-10

and under each partition we have the following logs ( example )

4.0K    00000000000000023657.index
268K    00000000000000023657.log
4.0K    00000000000000023657.timeindex
4.0K    00000000000000023854.index
24K     00000000000000023854.log
4.0K    00000000000000023854.timeindex

In the cluster we have 3 kafka machines ( 3 brokers ) About kafka storage – each kafka include disk with size of 100G

let’s say that we want to purge the logs in the topic when disk comes to 70% from the total disk ,

so now let’s try to calculate the value of log.retention.bytes according to the above info

because we have 10 topic partitions and the we want to limit the total size of the disk to 70G

then my assumption is to do the calculate as the following

each partition will limit to 7G and 7G translating to bytes , so it is  7516192768 bytes

7G X 10 = 70G ( 70% from the total disk )

So seems that log.retention.bytes should set to 7516192768 , in order to limit each partition to 7516192768 bytes

Dose my assumption is logical?

If not then what is the right calculation of - log.retention.bytes ? , based on that kafka disk is 100G , and we have only 10 topic partitions under /var/kafka/kafka-logs

672

asked Oct 29 '18 06:10

Judy

1 Answers

You are on the right track. Just a couple of things to keep in mind:

log.retention.bytes defines how much data Kafka will ensure is available. So this is a lower bound limit. The maximum size on disk can be hard to exactly calculate as it depends on a number of settings like Segments and Indexes size, Segment roll time, cleaner interval (most log.* settings). See Kafka retention policies for some more details.

Planning for 70% of total disk usage is a good idea but in practice I'd still recommend to monitor your disk usage to avoid surprizes.
Based on your calculation, you are likely to require changes if you want to add partitions. Also note that replicas have to be counted, so if you create 1 new partitions with replication factor 3, 3 brokers will need to have the space available.

125

answered Nov 09 '22 04:11

Mickael Maison

Related questions
                            
                                Is it possible to replicate kafka topics without alias prefix with MirrorMaker2
                            
                                On kafka console not able to type message with size more than 4095 characters
                            
                                How to rename Kafka topic
                            
                                Streamparse wordcount example
                            
                                How to load balance the Kafka Leadership?
                            
                                zookeeper + Kafka - Unable to create data directory
                            
                                Kafka consumer gets stuck after exceeding max.poll.interval.ms
                            
                                kafka as event store in event sourced system
                            
                                When does kafka change leader?
                            
                                Spring Kafka producers throwing TimeoutExceptions
                            
                                Using AWS glue schema registry with confluent SerDe clients
                            
                                Single or multiple topic (stream) per Aggregate Root event in kafka
                            
                                Python: how to mock a kafka topic for unit tests?
                            
                                Kafka Streams error - Offset commit failed on partition, request timed out
                            
                                Kafka consumer "failed to find leader" when fetching topic metadata
                            
                                Max number of tuple replays on Storm Kafka Spout

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With