I have a question about Kafka Topic cleanup policies and their interaction of log.retention....
For example, if I set cleanup.policy to compact, compaction will only start after the retention time of the topic or retention time has no effect for compaction?
Second part of the question, if I use compact,delete together, and I have log.retention for lets say 1 day, topic compacted all the time but content of the topic will be deleted after one day? or compaction and delete realised after one day?
Thx for answers...
log.cleanup.policy=compactThis policy is the default for Kafka's __consumer_offsets topic. With this policy on a topic, Kafka only stores the most recent value for each key in the topic. Setting the policy to compact only makes sense on topics for which applications produce events that contain both a key and a value.
To answer your first question, Yes it is ok to delete old Kafka log files. Those are meant for your use only if you want to trace back the history logs.
Log Retention (Garbage Collection) is a cleanup strategy to discard (delete) old log segments when their retention time or size limit has been reached. By default there is only a time limit and no size limit. Retention time is controlled by the cluster-wide log.retention.ms, log.
The most common configuration for how long Kafka will retain messages is by time. The default is specified in the configuration file using the log. retention. hours parameter, and it is set to 168 hours, the equivalent of one week.
This string designates the retention policy to use on old log segments. The default policy ("delete") will discard old segments when their retention time or size limit has been reached. The "compact" setting will enable log compaction on the topic. Specify the final compression type for a given topic.
Config ‘log.cleanup.policy’ can have a value among ‘delete’, ‘compact’ or ‘compact, delete’ What is Log Cleaner? Log cleaner does Log compaction. Log cleaner is a pool of background compaction threads. How each compaction thread works?
You can use a job queue entry to apply retention policies to delete data automatically, or you can manually apply policies. To apply a retention policy automatically, just create and enable a policy. When you enable a policy we create a job queue entry that will apply retention policies according to the retention period you specify.
A string that is either "delete" or "compact" or both. This string designates the retention policy to use on old log segments. The default policy ("delete") will discard old segments when their retention time or size limit has been reached. The "compact" setting will enable log compaction on the topic.
Log segments can be deleted or compacted, or both, to manage their size. The topic-level configuration cleanup.policy
determines the way the log segments for the topic are managed.
Log cleanup by compaction
If the topic-level configuration cleanup.policy
is set to compact
,the log for the topic is compacted periodically in the background by the log cleaner.
In a compacted topic,the log only needs to contain the most recent message for each key while earlier messages can be discarded.
There is no need to set log.retention to -1 or any other value. Your topics will be compacted and old messages never deleted (as per compaction rules).
Note that only the inactive file segment can be compacted; active segment will never be compacted.
Log cleanup by using both
You can specify both delete
and compact
values for the cleanup.policy
configuration at the same time. In this case, the log is compacted, but the cleanup process also follows the retention time
or size limit
settings.
I would suggest you to go through the following links
https://ibm.github.io/event-streams/installing/capacity-planning/
https://kafka.apache.org/documentation/#compaction
https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+compaction+and+deletion+to+co-exist
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With