Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consuming from single kafka partition by multiple consumers

I read following in kafka docs:

  • The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time.
  • Kafka only provides a total order over records within a partition, not between different partitions in a topic.
  • Per-partition ordering combined with the ability to partition data by key is sufficient for most applications.
  • However, if you require a total order over records this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.

I read following on this page:

  • Consumers read from any single partition, allowing you to scale throughput of message consumption in a similar fashion to message production.
  • Consumers can also be organized into consumer groups for a given topic — each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic.
  • If you have more consumers than partitions then some consumers will be idle because they have no partitions to read from.
  • If you have more partitions than consumers then consumers will receive messages from multiple partitions.
  • If you have equal numbers of consumers and partitions, each consumer reads messages in order from exactly one partition.

Doubts

  1. Does this means that single partition cannot be consumed by multiple consumers? Cant we have single partition and a consumer group with more than one consumer and make them all consume from single partition?

  2. If single partition can be consumed by only single consumer, I was thinking why is this design decision?

  3. What if I need total order over records and still need it to be consumed parallel? Is it undoable in Kafka? Or such scenario does not make sense?

like image 519
anir Avatar asked Sep 16 '19 07:09

anir


People also ask

How does Kafka deal with multiple consumers?

You can't have multiple consumers that belong to the same group in one thread and you can't have multiple threads safely use the same consumer. One consumer per thread is the rule. To run multiple consumers in the same group in one application, you will need to run each in its own thread.

What happens if there are more consumers than partitions in Kafka?

Once there are more consumers than partitions, the excess consumers will sit idle and receive no messages. This means the number of partitions in a topic will be the limit on the effective number of consumers that consume messages at any given time.

Can multiple producers write to same partition in Kafka?

Kafka is able to seamlessly handle multiple producers that are using many topics or the same topic. The consumer subscribes to one or more topics and reads the messages.


1 Answers

  1. Within a consumer group, at any time a partition can only be consumed by a single consumer. No you can't have 2 consumers within the same group consuming from the same partition at the same time.

  2. Kafka Consumer groups allow to have multiple consumer "sort of" behave like a single entity. The group as a whole should only consume messages once. If multiple consumer in a group were to consume the same partitions, these records would be processed multiple times.

    If you need to consume a partition multiple times, be sure these consumers are in different groups.

  3. When processing needs to happen in order (serially) at any time there's only a single task to do. If you have records 1, 2 and 3 and want to process them in order, you cannot do anything until message 1 has been processed. It's the same for message 2 and 3. So what do you want to do in parallel?

like image 148
Mickael Maison Avatar answered Oct 19 '22 04:10

Mickael Maison