Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka get to know when related messages are consumed

is there any way, in Kafka, to produce a message once several related messages have been consumed ? (without having to manually control it at the application code...)

The use case would be to pick a huge file, split it into several chunks, publish a message for each of these chunks in a topic, and once all these messages are consumed produce another message notifying the result on another topic.

We can do it with a database, or REDIS, to control the state but I wonder if there's any higher level approach leveraging only Kafka ecosystem.

like image 997
Luiz Henrique Martins Lins Rol Avatar asked Sep 11 '20 17:09

Luiz Henrique Martins Lins Rol


2 Answers

Approach can be as follow:

  1. After consuming each chunk application should produce message with status (Consumed, and chunk number)
  2. Second application (Kafka Streams once) should aggregate result and, when process messages with all chunks produce final message, that file is processed.
like image 120
Bartosz Wardziński Avatar answered Oct 18 '22 20:10

Bartosz Wardziński


You can use ConsumerGroupCommand to check if certain consumer group has finished processing all messages in a particular topic:

  1. $ kafka-consumer-groups --bootstrap-server broker_host:port --describe --group chunk_consumer

OR

  1. $ kafka-run-class kafka.admin.ConsumerGroupCommand ...

Zero lag for every partition will indicate that the messages have been consumed successfully, and offsets committed by the consumer.

Alternatively, you can choose to subscribe to the __consumer_offsets topic and process messages from it yourself, but using ConsumerGroupCommand seems like a more straightforward solution.

like image 23
mazaneicha Avatar answered Oct 18 '22 19:10

mazaneicha