One of the major role of log.retention.byte parameter is to avoid full size of the kafka disk , or in other words purging of data logs in order to avoid kafka disk full
According to the following link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_kafka-component-guide/content/kafka-broker-settings.html
log.retention.bytes – is The amount of data to retain in the log for each topic partition. By default, log size is unlimited.
We can see also the Note - that this is the limit for each partition, so multiply this value by the number of partitions to calculate the total data retained for the topic.
In order to understanding it well Let’s give little example ( hands-on is always much better)
In kafka machine Under /var/kafka/kafka-logs we have the following topic partitions , while Topic name is - lop.avo.prt.prlop
example of topics partitions under /var/kafka/kafka-logs
lop.avo.prt.prlop-1
lop.avo.prt.prlop-2
lop.avo.prt.prlop-3
lop.avo.prt.prlop-4
lop.avo.prt.prlop-5
lop.avo.prt.prlop-6
lop.avo.prt.prlop-7
lop.avo.prt.prlop-8
lop.avo.prt.prlop-9
lop.avo.prt.prlop-10
and under each partition we have the following logs ( example )
4.0K 00000000000000023657.index
268K 00000000000000023657.log
4.0K 00000000000000023657.timeindex
4.0K 00000000000000023854.index
24K 00000000000000023854.log
4.0K 00000000000000023854.timeindex
In the cluster we have 3 kafka machines ( 3 brokers ) About kafka storage – each kafka include disk with size of 100G
let’s say that we want to purge the logs in the topic when disk comes to 70% from the total disk ,
so now let’s try to calculate the value of log.retention.bytes according to the above info
because we have 10 topic partitions and the we want to limit the total size of the disk to 70G
then my assumption is to do the calculate as the following
each partition will limit to 7G and 7G translating to bytes , so it is 7516192768 bytes
7G X 10 = 70G ( 70% from the total disk )
So seems that log.retention.bytes should set to 7516192768 , in order to limit each partition to 7516192768 bytes
Dose my assumption is logical?
If not then what is the right calculation of - log.retention.bytes ? , based on that kafka disk is 100G , and we have only 10 topic partitions under /var/kafka/kafka-logs
So for example, if you are generally sending in 200MB a day of messages to a single partition topic, and you want to keep them for 5 days you would set retention. bytes to 1GB (200MB x 5 days). If this was over 10 partitions then you would set retention. bytes = 100MB (1GB / 10 partitions).
If the log retention is set to five days, then the published message is available for consumption five days after it is published. After that time, the message will be de discarded to free up space. The performance of Kafka is not affected by the data size of messages, so retaining lots of data is not a problem.
The most common configuration for how long Kafka will retain messages is by time. The default is specified in the configuration file using the log. retention. hours parameter, and it is set to 168 hours, the equivalent of one week.
By default, each Kafka topic partition log file will start at a minimum size of 20 MB and grow to a maximum size of 100 MB on disk before a new log file is created. It's possible to have multiple log files in a partition replica at any one time.
You are on the right track. Just a couple of things to keep in mind:
log.retention.bytes
defines how much data Kafka will ensure is available. So this is a lower bound limit. The maximum size on disk can be hard to exactly calculate as it depends on a number of settings like Segments and Indexes size, Segment roll time, cleaner interval (most log.*
settings). See Kafka retention policies for some more details.
Planning for 70% of total disk usage is a good idea but in practice I'd still recommend to monitor your disk usage to avoid surprizes.
Based on your calculation, you are likely to require changes if you want to add partitions. Also note that replicas have to be counted, so if you create 1 new partitions with replication factor 3, 3 brokers will need to have the space available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With