I'm reading the Kafka documentation and noticed the following line:
Note however that there cannot be more consumer instances in a consumer group than partitions.
Hmm. How can I auto-scale this?
For example let's say I have a messaging system with hi/lo priorities, so I create a topic for messages and partitions for hi and lo priority messages.
If this was RabbitMQ, I'd have an auto-scalable group of consumers assigned to each partition, like this:

If I understand the Kafka model I can't have >1 consumer per partition in a consumer group, so that picture doesn't work for Kafka, right?
Ok, so what about >1 consumer groups like this:

That get's around Kafka's limitation but... If I understand how this works both consumer groups would be pulling from a partition, for example msg.hi, with their own offsets so neither would know about the other--meaning messages would likely be delivered twice!
How can I achieve the capability I had in the Rabbit design w/Kafka and still maintain the "queue-ness" of the behavior (I don't want to send a message twice)? What am I missing?
To scale the Kafka connector side you have to increase the number of tasks, ensuring that there are sufficient partitions. In theory, you can set the number of partitions to a large number initially, but in practice, this is a bad idea.
To scale out Kafka, you provision extra storage, update the deployment configuration, and then perform a helm upgrade. To redistribute the topic partitions over all available brokers, you then reassign topic partitions.
Increasing the number of partitions and the number of brokers in a cluster will lead to increased parallelism of message consumption, which in turn improves the throughput of a Kafka cluster; however, the time required to replicate data across replica sets will also increase.
Kafka became a standard for highly loaded streaming systems. It's horizontally scalable, very fast and reliable.
Topic is made up of partitions. Partitions decide the max number of consumers you can have in a group. 
In the above picture, we have only one consumer. It can read all the messages from all the partitions. When you increase the number of consumers in the group, partition reassignment happens and instead of consumer 1 reading all the messages from all the partitions, consumer 2 could share some of the load with consumer 1 as shown below.  What happens If I have more number of consumers than the number of partitions.? Each consumer would be assigned 1 partition. Any additional consumers in the group will be sitting idle unless you increase the number of partitions for a Topic.
 What happens If I have more number of consumers than the number of partitions.? Each consumer would be assigned 1 partition. Any additional consumers in the group will be sitting idle unless you increase the number of partitions for a Topic.

So, we need to choose the partitions accordingly. That decides the max number of consumers in the group. Changing the partition for an existing topic is really not recommended as It could cause issues. That is, Lets assume a producer producing names into a topic where we have 3 partitions. All the names starting with A-I go to Partition 1, J-R in partition 2 and S-Z in partition 3. Lets also assume that we have already produced 1 million messages. Now if you suddenly increase the number of partitions to 5 from 3, It will create a different A-Z range now. That is, A-F in Partition 1, G-K in partition 2, L-Q in partition 3, R-U in partition 4 and V-Z in partition 5. Do you get it? It kind of affects the order of the messages we had before! So you need to be aware of this. If this could be a problem, then we need to choose the partition accordingly upfront.
More info is here - http://www.vinsguru.com/kafka-scaling-consumers-out-for-a-consumer-group/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With