I'm studying distributed systems and referring to this old question: stackoverflow link
I really can't understand the difference between exactly-once, at-least-once and at-most-once guarantees, I read these concepts in Kafka, Flink and Storm and Cassandra also. For instance someone says that Flink is better because has exactly-once guarantees while Storm has only at-least-once.
I understand that exactly-once mode is better for latency but at the same time it's worse for fault tolerance right? How can recover a stream if I haven't duplicates? and then... if this is a real problem, why exactly-once guarantee is considered better than others?
Someone can give me better definitions?
At-most-once is ideal for applications that need high throughput and low latency due to the fire-and-forget nature. It is the default producer and consumer delivery semantic. At-least-once and exactly-once delivery will require additional configuration.
Exactly-once as the name suggests, there will be only one and once message delivery. It difficult to achieve in practice. In this case offset needs to be manually managed.
At-least once as the name suggests, message will be delivered atleast once. There is high chance that message will be delivered again as duplicate.
at-least-once delivery means that for each message handed to the mechanism potentially multiple attempts are made at delivering it, such that at least one succeeds; again, in more casual terms this means that messages may be duplicated but not lost.
Below definitions are quoted from Akka Documentation
at-most-once delivery
means that for each message handed to the mechanism, that message is delivered zero or one times; in more casual terms it means that messages may be lost.
at-least-once delivery
means that for each message handed to the mechanism potentially multiple attempts are made at delivering it, such that at least one succeeds; again, in more casual terms this means that messages may be duplicated but not lost.
exactly-once delivery
means that for each message handed to the mechanism exactly one delivery is made to the recipient; the message can neither be lost nor duplicated.
The first one is the cheapest—highest performance, least implementation overhead—because it can be done in a fire-and-forget fashion without keeping state at the sending end or in the transport mechanism. The second one requires retries to counter transport losses, which means keeping state at the sending end and having an acknowledgement mechanism at the receiving end. The third is most expensive—and has consequently worst performance—because in addition to the second it requires state to be kept at the receiving end in order to filter out duplicate deliveries
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With