Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka - Retention period Parameter

Tags:

apache-kafka

Trying to understand the logic behind retention period in Apache Kafka. Please help me to understand the situation for the below scenarios.

  1. If retention period is set as 0, what will happen? Will all records be deleted?
  2. If we delete the retention parameter itself, will it take the default value?
like image 696
Karthikeyan Rasipalay Durairaj Avatar asked Oct 24 '25 19:10

Karthikeyan Rasipalay Durairaj


1 Answers

  1. Kafka doesn't allow you to set the retention period as zero, in units of hours. It has to be at-least 1. In case, you set it to zero, you'll get the following error message, and the broker won't start.

java.lang.IllegalArgumentException: requirement failed: log.retention.ms must be unlimited (-1) or, equal or greater than 1

You can still set it to zero while using the parameters log.retention.minutes or log.retention.ms

  • Now, let's come to the point of data deletion. In this situation, the old data won't likely get deleted even after the set retention (say 1 hr, or 1 min) has expired, because one more variable in server.properties called log.segment.bytes plays a major role there. The value of log.segment.bytes is set to 1GB by default. Kafka only performs deletion on a closed segment. So, once a log segment has reached 1GB, only then it is closed, and only after that the retention kicks in. So, you need to reduce the size of log.segment.bytes to some approximate value which is atmost the size of the cumulative investion volume of the data that you are planning to retain for that short duration. E.g. if your retention period is 10 min, and you get roughly 1 MB of data per minute, then you can set the log.segment.bytes=10485760 which is 1024 x 1024 x 10. You can find an example of how retention is dependent both on the data ingestion and time in this thread.

  • To test this, we can try a small experiment. Let's start Zookeeper and Kafka, create a topic called testand change its retention period to zero.

    1) nohup ./zookeeper-server-start.sh ../config/zookeeper.properties &
    2) nohup ./kafka-server-start.sh ../config/server.properties &
    3) ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
    4) ./kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name test --alter --add-config log.retention.ms=0
    
  • Now if we insert sufficient records using Kafka-console-producer, even after 2-3 minutes, we'll see the records are not deleted. But now, let's change the log.segment.bytes to 100 bytes.

    5) ./kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name test --alter --add-config segment.bytes=100 
    
  • Now, almost immediately we'll see that old records are getting deleted from Kafka.

  1. Yes. As it happens with every Kafka parameter in server.properties, if we delete/comment out a property, the default value for that property kicks in. I think, the default retention period is 1 week.
like image 105
Bitswazsky Avatar answered Oct 26 '25 12:10

Bitswazsky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!