Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is best practice to consume messages from multiple kafka topics?

I need to consumer messages from different kafka topics,

  1. Should i create different consumer instance per topic and then start a new processing thread as per the number of partition. or
  2. I should subscribe all topics from a single consumer instance and the should start different processing threads

Thanks & regards, Megha

like image 367
Megha Avatar asked Oct 08 '17 08:10

Megha


2 Answers

The only rule is that you have to account for what Kafka does and doesn't not guarantee:

  • Kafka only guarantees message order for a single topic/partition. edit: this also means you can get messages out of order if your single topic Consumer switches partitions for some reason.
  • When you subscribe to multiple topics with a single Consumer, that Consumer is assigned a topic/partition pair for each requested topic.
  • That means the order of incoming messages for any one topic will be correct, but you cannot guarantee that ordering between topics will be chronological.
  • You also can't guarantee that you will get messages from any particular subscribed topic in any given period of time.

I recently had a bug because my application subscribed to many topics with a single Consumer. Each topic was a live feed of images at one image per message. Since all the topics always had new images, each poll() was only returning images from the first topic to register.

If processing all messages is important, you'll need to be certain that each Consumer can process messages from all of its subscribed topics faster than the messages are created. If it can't, you'll either need more Consumers committing reads in the same group, or you'll have to be OK with the fact that some messages may never be processed.

Obviously one Consumer/topic is the simplest, but it does add some overhead to have the additional Consumers. You'll have to determine whether that's important based on your needs.

The only way to correctly answer your question is to evaluate your application's specific requirements and capabilities, and build something that works within those and within Kafka's limitations.

like image 84
TheAtomicOption Avatar answered Nov 15 '22 05:11

TheAtomicOption


This really depends on logic of your application - does it need to see all messages together in one place, or not. Sometimes, consumption from single topic could be easier to implement in terms of business logic of your application.

like image 41
Alex Ott Avatar answered Nov 15 '22 05:11

Alex Ott