How does a Kafka Sink connector ensure message ordering while fetching messages from partitions. I have multiple partitions and I have ensured message ordering while publishing of messages with hash-keys per partition. Now, when more than one Sink Tasks(and their workers) are scaled from multiple JVMs with the responsibility to fetch messages from same partition and to notify a destination system via HTTP, how can I guarantee that the destination system will receive the messages in order.
Each sink task will receive the ordered events as available from their assigned topics, but as soon as it leaves the Kafka protocol handling, and is sent to a remote destination, whether that be a file or HTTP endpoint, order can only be guaranteed based upon that system's ordering semantics.
For example, if you're writing to Elasticsearch, you can "order" events (in Kibana) by specifying the timestamp field to index by. Similar for any (no)SQL database
A filesystem on the other hand, would order files by modification time, but events within any given file aren't guaranteed to be ordered (unless they come from one partition).
I find it unlikely an HTTP REST endpoint will be able to understand what order events need to be collected by, and that logic would need to be determined internally to that server endpoint. One option would be to post events to an endpoint that will accept the partition number, and the offset the record came from
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With