Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between kafka idempotent and transactional producer setup?

When setting up a kafka producer to use idempotent behaviour, and transactional behaviour:

I understand that for idempotency we set: enable.idempotence=true and that by changing this one flag on our producer, we are guaranteed exactly-once event delivery?

and for transactions, we must go further and set the transaction.id=<some value> but by setting this value, it also sets idempotence to true?

Also, by setting one or both of the above to true, the producer will also set acks=all.

With the above should I be able to add 'exactly once delivery' by simply changing the enable idempotency setting? If i wanted to go further and enable transactional support, On the Consumer side, I would only need to change their setting, isolation.level=read_committed? Does this image reflect how to setup the producer in terms of EOS?

enter image description here

like image 960
Ryu S. Avatar asked Feb 18 '20 14:02

Ryu S.


People also ask

What is Kafka idempotent producer?

The Kafka Producer configuration enable. idempotence determines whether the producer may write duplicates of a retried message to the topic partition when a retryable error is thrown. Examples of such transient errors include leader not available and not enough replicas exceptions.

What is idempotent consumer in Kafka?

An Idempotent Consumer pattern uses a Kafka consumer that can consume the same message any number of times, but only process it once. To implement the Idempotent Consumer pattern the recommended approach is to add a table to the database to track processed messages.

Can Kafka be transactional?

toString()))); Note that because the producer can partition the data by the key, this means that transactional messages can span multiple partitions, each being read by separate consumers. Therefore, Kafka broker will store a list of all updated partitions for a transaction.

How are producer transactions implemented in Kafka?

The API requires that the first operation of a transactional producer should be to explicitly register its transactional.id with the Kafka cluster. When it does so, the Kafka broker checks for open transactions with the given transactional.id and completes them.


2 Answers

Yes you understood the main concepts.

By enabling idempotence, the producer automatically sets acks to all and guarantees message delivery for the lifetime of the Producer instance.

By enabling transactions, the producer automatically enables idempotence (and acks=all). Transactions allow to group produce requests and offset commits and ensure all or nothing gets committed to Kafka.

When using transactions, you can configure if consumers should only see records from committed transactions by setting isolation.level to read_committed, otherwise by default they see all records including from discarded transactions.

like image 161
Mickael Maison Avatar answered Nov 04 '22 06:11

Mickael Maison


Actually idemnpotency by itself does not always guarantee exactly once event delivery. Let's say you have a consumer that consumes an event, processes it and produces an event. Somewhere in this process the offset that the consumer uses must be incremented and persisted. Without a transactional producer, if it happens before the producer sends a message, the message might not be sent and its at most once delivery. If you do it after the message is sent you might fail in persisting the offset and then the consumer would read the same message again and the producer would send a duplicate, you get an at least once delivery. The all or nothing mechanism of a transactional producer prevents this scenario given that you store your offset on kafka, the new message and the incrementation of the offset of the consumer becomes an atomic action.

like image 25
Yuval Perelman Avatar answered Nov 04 '22 04:11

Yuval Perelman