Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Kafka message consumption when partitions outnumber consumers

Tags:

apache-kafka

If I'm running a Kafka cluster with more partitions than my lone consumer group has consumers. Are there any guarantees made on ordering of messages, or on-time delivery of messages across partitions?

Simple example:
2 Partitions, 1 Consumer
The Producers are controlling Partition assignment via a key.
Message 1 comes in and goes to Partition A
Message 2 comes in and goes to Partition B
Message 3 comes in and goes to Partition A

I know Message 1 will be consumed before Message 3, because they are in the same partition. But what about Message 2? Will it be consumed before Message 3 or after? Or could it vary? Could it possibly be consumed before Message 1?

Moreover, what if new Messages continue to come in for Partition A and the production is faster than consumption? Will Message 2 sit in Partition B indefinitely? When will it be consumed? Are there any guarantees that the messages will not sit there forever?

More generally: If a consumer is assigned to multiple partitions, how and when does that consumer swap between those partitions?

like image 931
TwoScoopsOfHot Avatar asked Jan 22 '14 21:01

TwoScoopsOfHot


People also ask

What happens if there are more consumers than partitions in Kafka?

More consumers in a group than partitions means idle consumers. The main way we scale data consumption from a Kafka topic is by adding more consumers to a consumer group. It is common for Kafka consumers to do high-latency operations such as write to a database or a time-consuming computation on the data.

What happens if there are more partitions than consumers?

You can have fewer consumers than partitions (in which case consumers get messages from multiple partitions), but if you have more consumers than partitions some of the consumers will be “starved” and not receive any messages until the number of consumers drops to (or below) the number of partitions.

Can a Kafka consumer consume from multiple partitions?

A consumer can be assigned to consume multiple partitions. So the rule in Kafka is only one consumer in a consumer group can be assigned to consume messages from a partition in a topic and hence multiple Kafka consumers from a consumer group can not read the same message from a partition.

How many Kafka partitions is too many?

But here are a few general rules: maximum 4000 partitions per broker (in total; distributed over many topics) maximum 200,000 partitions per Kafka cluster (in total; distributed over many topics) resulting in a maximum of 50 brokers per Kafka cluster.


1 Answers

Ordering guarantees

Kafka provides ordering guarantees only within a partition. In your example, Message 2 might be consumed either before Message 1, after Message 1 or after Message 3. That's only depends on the performance of the consumer. More information on this is available in the documentation: https://kafka.apache.org/documentation.html#introduction ('Consumers' and 'Guarantees' topics).

Slow consumption

Kafka broker is not aware of the consumers. It stores the messages in log segments until corresponding log segment gets deleted. Consumers may attach to the broker at any moment and start consumption from the oldest log segment. Minimum message retention time is controlled by two configuration properties: log.retention.hours and log.retention.bytes (with possible overrides per topic). More on this in documentation: https://kafka.apache.org/documentation.html#brokerconfigs.

Answering your question: if the consumer eventually gets slower than producer, it has some time to catch up (1 week by default). If it doesn't, some non-consumed messages will be deleted forever.

Consuming multiple partitions

High-level consumer creates several KafkaStream objects, each providing data from one or multiple partitions. It's up to you how to consume these streams: in separate threads, round robin, etc. It's also possible to fetch timestamps of messages and merge the streams into a single stream restoring message order.

like image 199
Wildfire Avatar answered Oct 21 '22 03:10

Wildfire