Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Kafka the order of messages in partition guarantee

Read this article about message ordering in topic partition: https://blog.softwaremill.com/does-kafka-really-guarantee-the-order-of-messages-3ca849fd19d2

Allowing retries without setting max.in.flight.requests.per.connection to 1 will potentially change the ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second succeeds, then the records in the second batch may appear first.

According it there are two types of producer configs possible to achieve ordering guarantee:

max.in.flight.requests.per.connection=1 // can impact producer throughput

or alternative

enable.idempotence=true
max.in.flight.requests.per.connection //to be less than or equal to 5
max.retries // to be greater than 0
acks=all

Can anybody explain how second configuration achieves order guarantee? Also in the second config exactly-once semantics enabled.

like image 237
GreenNun Avatar asked Nov 07 '19 16:11

GreenNun


2 Answers

idempotence:(Exactly-once in order semantics per partition)

Idempotent delivery enables the producer to write a message to Kafka exactly once to a particular partition of a topic during the lifetime of a single producer without data loss and order per partition.

Idempotent is one of the key features to achieve Exactly-once Semantics in Kafka. To set “enable.idempotence=true” eventually get exactly-once semantics per partition, meaning no duplicates, no data loss for a particular partition. If an error occurred even producer send messages multiple times will get written to Kafka once.

Kafka producer concept of PID and Sequence Number to achieve idempotent as explained below:

PID and Sequence Number

Idempotent producers use product id(PID) and sequence number while producing messages. The producer keeps incrementing the sequence number on each message published which map with unique PID. The broker always compares the current sequence number with the previous one and it rejects if the new one is not +1 greater than the previous one which avoids duplication and the same time if more than greater show lost in messages.

enter image description here

In a failure scenario it will still maintain sequence number and avoid duplication as shown below:

enter image description here

Note: When the producer restarts, new PID gets assigned. So the idempotency is promised only for a single producer session

If you are using enable.idempotence=true you can keep max.in.flight.requests.per.connection up to 5 and you can achieve order guarantee which brings better parallelism and improve performance.

Idempotence feature introduced in Kafka 0.11+ before we can achieve some level level of guaranteed using max.in.flight.requests.per.connection with retries and Acks setting:

max.in.flight.requests.per.connection to 1
max.retries bigger number
acks=all

max.in.flight.requests.per.connection=1: to make sure that while messages are retrying, additional messages will not be sent.

This gives guarantee at-least-once and comes with cost on performance and throughput and that's encourage introduced enable.idempotence feature to improve the performance and at the same time guarantee ordering.

exactly_once: To achieve exactly_once along with idempotence we need to set transaction as read_committed and will not allow to overwrite following parameters:

  • isolation.level:read_committed( Consumers will always read committed data only)

  • enable.idempotence=true (Producer will always haveidempotency enabled)

  • MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION=5 (Producer will always have one in-flight request per connection)

like image 143
Nitin Avatar answered Sep 28 '22 14:09

Nitin


enable.idempotence is a newer setting that was introduced as part of kip-98 (implemented in kafka 0.11+). before it users would have to set max.inflight to 1.

the way it works (abbreviated) is that producers now put sequence numbers on ourgoing produce batches, and brokers keep track of these sequence numbers per producer connected to them. if a broker receives a batch out of order (say batch 3 after 1) it rejects it and expects to see batch 2 (which the producer will retransmit). for complete details you should read kip-98

like image 38
radai Avatar answered Sep 28 '22 14:09

radai