Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write Kafka consumers - single threaded vs multi threaded

Tags:

I have written a single Kafka consumer (using Spring Kafka), that reads from a single topic and is a part of a consumer group. Once a message is consumed, it will perform all downstream operations and move on to the next message offset. I have packaged this as a WAR file and my deployment pipeline pushes this out to a single instance. Using my deployment pipeline, I could potentially deploy this artifact to multiple instances in my deployment pool.

However, I am not able to understand the following, when I want multiple consumers as part of my infrastructure -

  • I can actually define multiple instances in my deployment pool and have this WAR running on all those instances. This would mean, all of them are listening to the same topic, are a part of the same consumer group and will actually divide the partitions among themselves. The downstream logic will work as is. This works perfectly fine for my use case, however, I am not sure, if this is the optimal approach to follow ?

  • Reading online, I came across resources here and here, where people are defining a single consumer thread, but internally, creating multiple worker threads. There are also examples where we could define multiple consumer threads that do the downstream logic. Thinking about these approaches and mapping them to deployment environments, we could achieve the same result (as my theoretical solution above could), but with less number of machines.

Personally, I think my solution is simple, scalable but might not be optimal, while the second approach might be optimal, but wanted to know your experiences, suggestions or any other metrics / constraints I should consider ? Also, I am thinking with my theoretical solution, I could actually employ bare bones simple machines as Kafka consumers.

While I know, I haven’t posted any code, please let me know if I need to move this question to another forum. If you need specific code examples, I can provide them too, but I didn’t think they are important, in the context of my question.

like image 209
user3842182 Avatar asked Apr 26 '18 21:04

user3842182


People also ask

Can Kafka consumer be multi-threaded?

You can't have multiple consumers that belong to the same group in one thread and you can't have multiple threads safely use the same consumer. One consumer per thread is the rule. To run multiple consumers in the same group in one application, you will need to run each in its own thread.

How do you make Kafka consumer thread safe?

The Kafka consumer is NOT thread-safe. All network I/O happens in the thread of the application making the call. It is the responsibility of the user to ensure that multi-threaded access is properly synchronized. Un-synchronized access will result in ConcurrentModificationException.

Can multiple consumers read from a single partition Kafka?

So the rule in Kafka is only one consumer in a consumer group can be assigned to consume messages from a partition in a topic and hence multiple Kafka consumers from a consumer group can not read the same message from a partition.

Can consumer consume from multiple topics?

Multi-Topic Consumers We may have a consumer group that listens to multiple topics. If they have the same key-partitioning scheme and number of partitions across two topics, we can join data across the two topics.


1 Answers

Your existing solution is best. Handing off to another thread will cause problems with offset management. Spring kafka allows you to run multiple threads in each instance, as long as you have enough partitions.

like image 111
Gary Russell Avatar answered Oct 07 '22 03:10

Gary Russell