Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delayed message consumption in Kafka

How can I produce/consume delayed messages with Apache Kafka? Seems like standard Kafka (and Java kafka-client) functionality doesn't have this feature. I know that I could implement it myself with standard wait/notify mechanism, but it doesn't seem very reliable, so any advices and good practices are appreciated.

Found related question, but it didn't help. As I see: Kafka is based on sequential reads from file system and can be used only to read topics straightforward keeping message ordering. Am I right?

like image 307
Everv0id Avatar asked Dec 27 '17 11:12

Everv0id


1 Answers

Indeed, kafka lowest structure is a partition, which are sequential events in a queue with incremental offset - you can't insert a log anywhere else than the end at the moment you produce it. There is no concept of delayed messages.

What do you want to achieve exactly?

Some possibilities in your case:

  • You want to push a message at a specific time (for example, an event "start job"). In this case, use a scheduled task (not from kafka, use some standard way on your os / language / custom app / whatever) to send the message at the given time - consumers will receive them at the proper time.

  • You want to send an event now, but which should not be taken into account now by consumers. In this case, you can use a custom structure which would include a "time" in its payload. Consumers will have to understand this field and have custom processing to deal with it. For exemple: "start job at 2017-12-27T20:00:00Z". You could also use headers for this, but headers are not supported by all clients for now.

  • You can change the timestamp of the message sent. Internally, it would still be read in order, but some functions implying time would work differently, and consumer could use the timestamp of the message for its action - this is kinda like the previous proposition, except the timestamp is one metadata of the event, and not the event payload itself. I would not use this personally - I only deal with timestamp when I proxy some events.

For your last question: basically, yes, but with some notes:

  • Topics are actually split in partition, and order is only preserved in partition. All message with same key are send to same partition.
  • Most of time, you only read from memory, except if you read old events - in this case, as those are sequentially read from disk, this is very fast
  • You can choose where to begin to read - a given offset or a given time - and even change it at runtime
  • You can parallelize read across process - multiple consumers can read the same topics and never reading the same messages twice (each reading different partition, see consumer groups)
like image 62
Treziac Avatar answered Oct 09 '22 11:10

Treziac