Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

kafka log-compaction consuming data

I'm reading about log compaction in the latest release of kafka and am curious how this impacts consumers. Do consumers work the same as they ever did, or is there a new process for getting all the latest values?

With 'standard' Kafka topics, I use a consumer group to maintain a pointer to the most recent values. But if Kafka is keeping values based on keys instead of time, I'm wondering how consumer groups will work?

like image 440
ethrbunny Avatar asked Aug 18 '16 13:08

ethrbunny


People also ask

How does Kafka log compaction work?

What is a Log Compacted Topics. Kafka documentation says: Log compaction is a mechanism to give finer-grained per-record retention, rather than the coarser-grained time-based retention. The idea is to selectively remove records where we have a more recent update with the same primary key.

How do you trigger log compaction in Kafka?

There are multiple ways in which you can achieve compaction for your data logs: Method 1: Using the Traditional Method of Discarding Old Data. Method 2: Storing Old Logs in the Compressed Format. Method 3: Kafka Log Compaction, A Hybrid Approach.

What is a log compacted topic?

With log compaction, the older values with duplicate keys are removed while retaining the newly arrived messages with distinct keys in the topic partition.

What is Max compaction lag MS?

max.compaction.lag.ms - the maximum delay between the time a message is written and the time the message becomes eligible for compaction. This configuration parameter overwrites min. cleanable. dirty. ratio and forces a log segment to become compactable even if the “dirty ratio” is lower than the threshold.


1 Answers

It does not effect how consumers work. If you are only interested in the latest value per key and read the whole topic, you might still see "duplicates" for a key (if not all duplicates got eliminated, or new messages got written after last compaction run) and thus you only care about the latest value per key.

About consumer groups: When a topic gets compacted, there are "holes" in the range of valid offsets. While you are consuming a topic regularly, you will skip over those automatically.

From https://kafka.apache.org/documentation.html#design_compactionbasics

Note also that all offsets remain valid positions in the log, even if the message with that offset has been compacted away; in this case this position is indistinguishable from the next highest offset that does appear in the log. For example, in the picture above the offsets 36, 37, and 38 are all equivalent positions and a read beginning at any of these offsets would return a message set beginning with 38.

like image 93
Matthias J. Sax Avatar answered Oct 13 '22 15:10

Matthias J. Sax