Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete message after consuming it in KAFKA

I am using apache kafka to produce and consume a file 5GB in size. I want to know if there is a way where the message from the topic is automatically removed after it is consumed. Do I have any way to keep track of consumed messages? I don't want to delete it manually.

like image 990
Shaik Mujahid Ali Avatar asked Feb 18 '15 14:02

Shaik Mujahid Ali


People also ask

Does Kafka remove message after consume?

Purging of messages in Kafka is done automatically by either specifying a retention time for a topic or by defining a disk quota for it so for your case of one 5GB file, this file will be deleted after the retention period you define has passed, regardless of if it has been consumed or not.

Can we delete message from Kafka topic?

The easiest way to purge or delete messages in a Kafka topic is by setting the retention.ms to a low value. retention.ms configuration controls how long messages should be kept in a topic. Once the age of the message in a topic hits the retention time the message will be removed from the topic.

How do I delete a single message from Kafka?

It is not possible to remove a single message from a Kafka topic, even though you know its partition and offset. Keep in mind, that Kafka is not a key/value store but a topic is rather an append-only(!) log that represents a stream of data.


2 Answers

In Kafka, the responsibility of what has been consumed is the responsibility of the consumer and this is also one of the main reasons why Kafka has such great horizontal scalability.

Using the high level consumer API will automatically do this for you by committing consumed offsets in Zookeeper (or a more recent configuration option is using by a special Kafka topic to keep track of consumed messages).

The simple consumer API make you deal with how and where to keep track of consumed messages yourself.

Purging of messages in Kafka is done automatically by either specifying a retention time for a topic or by defining a disk quota for it so for your case of one 5GB file, this file will be deleted after the retention period you define has passed, regardless of if it has been consumed or not.

like image 158
Lundahl Avatar answered Sep 18 '22 15:09

Lundahl


You cannot delete a Kafka message on consumption

Kafka does not have a mechanism to directly delete a message when it is consumed.

The closest thing I found at an attempt to do this is this trick but it is untested and by design it will not work on the most recent messages:

A potential trick to do this is to use a combination of (a) a compacted topic and (b) a custom partitioner (c) a pair of interceptors.

The process would follow:

  1. Use a producer interceptor to add a GUID to the end of the key before it is written.
  2. Use a custom partitioner to ignore the GUID for the purposes of partitioning
  3. Use a compacted topic so you can then delete any individual message you need via producer.send(key+GUID, null)
  4. Use a consumer interceptor to remove the GUID on read.

But you should not need this capability.

Have 1 or more consumers, and want a message to be consumed only once in total by them?
Put them in the same consumer group.

Want to avoid too many messages filling up the disk?
Set up retention in terms of disk space and or time.

like image 43
Dennis Jaheruddin Avatar answered Sep 21 '22 15:09

Dennis Jaheruddin