I want to understand how I need to design the message passing for microservices to be still elastic and scalable.
a.entity.created
including information like the id
of the entity that was created in the message body.a.entity.created
as a consumer group b-consumers
. This leads to a load distribution between instances of B.Questions:
a.entity.created.{entityId}
leading to a lot of different topics.a.entity.created.*
with a wildcard as a consumer group b-consumers
. This leads to a load distribution between instances of B.Questions:
a.entity.created
and partition key based on the entityId.a.entity.created
as a consumer group b-consumers
. This leads to a load distribution between instances of B. Events regarding the same entity will be delivered in order thanks to the partition key.Questions:
The strong and scalable Kafka is the best choice for the majority of microservices use cases.
To scale the Kafka connector side you have to increase the number of tasks, ensuring that there are sufficient partitions. In theory, you can set the number of partitions to a large number initially, but in practice, this is a bad idea.
As mentioned earlier, one of the major advantages of using Kafka is that it is highly scalable. In times of any node failure, Kafka allows for quick and automatic recovery. In a world that now deals with high volumes of real-time data, this feature makes Kafka a hands down choice for data communication and integration.
This is the primary goal of scaling out microservices: getting resources to different parts of the system that need it. Because resources are finite in any system, it's best to give the resources to the parts of the systems that need it and not over- or underutilize any part of those resources.
You don't need a separate topic per entity to ensure delivery order of messages.
Instead, assign a partition key on each message based on your entity id (or other immutable identifier for the entity) to ensure that events concerning the same entity are always delivered to the same partition on each topic.
Chronological ordering of messages on the partition will be generally preserved even if more than one producer instance publishes messages for the same entity (e.g. if A1 and A2 both publish messages for entity E1, the partition key for E1 will ensure that all E1 messages get delivered to the same parition, P1.). There are some edge cases where ordering will not be preserved (e.g. loss of connectivity of a producer), in which case you might look at enabling idempotence.
You are right, at any one time, at most one consumer from a consumer group will be subscribed to a single topic partition, although a consumer could be assigned to more than one partition. So e.g. consumer B2 might process all messages from partition P1 (in sequential order), as well as from another partition. If a consumer dies, then another consumer will be assigned to the partition (the transition can be several seconds).
When a partition key is provided on messages, by default, the message to partition assignment is done based on a hash of the partition key. In rare scenarios (e.g. very few entity partition keys), this could result in an uneven distribution of messages on partitions, in which case you could provide your own partitioning strategy, if you find that throughput is affected by the imbalance.
Based on the above, after configuring an appropriate number of partitions for your scenario, you should be able to meet your design goals using consistent partition keys without needing to do too much customisation to Kafka at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With